Navigation
Breadcrumb

Tables: Detection & Structure Recognition

Tracking models, datasets, and metrics for detecting tables in documents and parsing their internal structure.

Table of Contents

Disclaimer: This page covers the full table pipeline: detecting tables on a page (TD) and parsing their internal grid structure (TSR). For general-purpose layout detection (which often includes table regions), see the Layout Page. For reading order prediction, see the Reading Order Page. For end-to-end document understanding, see the Document Understanding Page.

Overview

The table pipeline in document analysis has two stages:

  1. Table Detection (TD): Locating table regions on a page. This is typically handled by general-purpose layout models (Faster R-CNN, YOLO, DETR) that classify “Table” as one of several region types. Some specialized models and benchmarks target TD specifically.
  2. Table Structure Recognition (TSR): Recovering the logical grid of a detected table region, including rows, columns, spanning cells, and (optionally) header vs. body roles. TSR operates on a pre-cropped table image produced by the detection stage.

Both stages are active research areas with distinct datasets, metrics, and modeling paradigms. Some datasets (TableBank, PubTables-1M) provide annotations for both stages.


Table Detection

TD: Paradigms

Table detection is most commonly handled as a special case of document layout analysis. General-purpose object detectors (Faster R-CNN, DETR, YOLO variants) trained on layout datasets like PubLayNet or DocLayNet naturally produce table bounding boxes alongside other region types.

A few dedicated efforts focus on TD specifically, often using competition benchmarks (ICDAR series) or specialized datasets where tables are the only annotated class.

TD: Models

Most table detection models are general layout detectors that happen to produce table bounding boxes. The tables below list models with TD-specific contributions or evaluations. For the full set of layout models, see the Layout Page.

Models with Code or Weights

DateNameArtifactsCodeLicenseNotes
2021-09Table Transformer (Det)microsoft/table-transformer-detectiontable-transformerMITNotes
2020-08CDeC-NetNoneCDeCNetMITNotes
2020-04CascadeTabNetNoneCascadeTabNetMITNotes

Methods (Paper Only)

DateNameNotes
2024-05SemiTabDETRNotes
2017-11DeepDeSRTDOI

Layout models with strong TD results: Many general-purpose layout models also detect tables. See the Layout Page for details on these models, which include:

  • DocLayout-YOLO (2024): YOLOv10-based; DocSynth-300K pre-training.
  • DiT (2022): BEiT-style pre-training; evaluated on ICDAR 2019 cTDaR.
  • SwinDocSegmenter (2023): Instance segmentation; TableBank 98.0 mAP.
  • IIIT-AR-13K baselines (2020): Cross-dataset TD evaluation on ICDAR 2013, cTDaR, UNLV, Marmot.
  • LayoutParser (2021): Pre-trained on TableBank (among others).

TD: Datasets

Large-scale annotated collections for table detection (bounding box) training and evaluation. Datasets are grouped by the most permissive use permitted: commercial, research / non-commercial, and not available / restricted.

Commercial Use

Training a private, for-profit model is permitted with minimal obligations.

DatasetPagesDomainAnnotationClassesEval SplitLicenseNotes
TNCR (2021)9,428 pagesFDA drug labelsHuman; format not stated5 (table type)YesMITNotes. Pages may contain multiple tables; total table count exceeds page count.

Research / Non-Commercial

Training a free, open-weight non-commercial model is permitted.

DatasetPagesDomainAnnotationClassesEval SplitLicenseNotes
WikiDT (2024)16,887 full pages (54,032 sub-pages after pagination); 159,905 table annotationsWikipedia screenshotsAuto (rendered from Wikipedia source); Pascal VOC XML1 (table)YesCC-BY-SA-3.0Notes. Also includes TSR annotations; see TSR section.
Open-Tables + ICT-TD (2023)~16k images (Open-Tables: 11,074; ICT-TD: 5,000)Cleaned merger of open TD datasets (ICDAR 2013/17/19, Marmot, TNCR) + ICT commodity PDFsAuto (re-annotated) + Human; format not stated1 (table)YesApache-2.0 (HuggingFace release); underlying Open-Tables sources include Marmot (research-only) and ICDAR (unknown)Notes. ICT-TD component is original data; Open-Tables component inherits source restrictions.
SCI-3000 (2023)34,791 pages (3,000 PDFs)Scientific (CS, biomed, chemistry, physics)Human3 (table, figure, caption)YesCC-BY-4.0ICDAR 2023. Zenodo.
TabRecSet (2023)32k imagesWild (scanned, camera, spreadsheet, bilingual EN/ZH)Human (LabelMe JSON polygons)1 (table)YesCC-BY-SA-4.0Notes. Also includes TSR + TCR; see TSR section.
PubTables-1M (2021)460k pages (~947k table instances)ScientificAuto (canonicalized); format not stated1 (table)YesCDLA-Perm-2.0 (annotations); underlying PMCOA images have mixed per-article licensesNotes. Table Transformer detection baseline: AP 0.966.
TableBank (2019)417k table instances (163k Word, 253k LaTeX)DiverseWeak (Word/LaTeX source); format not stated1 (table)YesApache-2.0 (code/models); Research-only (data)Notes. Data license is research-only despite code being Apache-2.0.
PubLayNet (2019)360k pagesScientificWeak5 (table is one class)YesCDLA-Perm-1.0 (annotations); underlying PMCOA PDFs are non-commercial onlyLayout. ICDAR 2013 TD transfer demonstrated.
Marmot (2012)2,000 pagesChinese e-books + English scientificHuman; format not stated1 (table)No official splitsResearch-onlyPKU. 50% hard negatives.
UNLV (2010)2,889 pagesDiverse (scanned)Human; format not stated1 (table)UnknownResearch-onlyShahab et al., DAS 2010. Used in ICDAR 2013 competition.

Not Available / Restricted

Described in a publication but not publicly downloadable. Included here because the papers provide useful methodological details and the data may become available in the future.

DatasetPagesDomainAnnotationClassesEval SplitLicenseNotes
BankTabNet (2024)11,607 pagesBank statements (transaction tables)Human (K-alpha 0.99); format not stated10 (table category)UnknownProprietaryNotes. Also includes TSR annotations; see TSR section.

TD: Benchmarks

Competition evaluation sets used to compare TD models. These are typically too small for training.

BenchmarkPagesDomainAnnotationMetricLicenseNotes
ICDAR 2019 cTDaR~600 (modern) + ~600 (archival)Modern + HistoricalHumanWeighted F1 (IoU $\geq$ 0.6)UnknownTwo tracks: modern documents and archival records. DiT and WordScape evaluated here.
ICDAR 2013 Table Competition238 (EU + US gov)Government documentsHumanCompleteness + PurityUnknownGobel et al. Classic TD benchmark; PubLayNet and IIIT-AR-13K models evaluated here.

Layout benchmarks with TD evaluation: General-purpose layout benchmarks also evaluate table detection as one region class. See the Layout Page for full details, including:

  • RoDLA (2024): Robustness benchmark covering PubLayNet-P, DocLayNet-P, M6Doc-P; introduces mPE and mRD metrics.
  • ICDAR 2017 POD: 2,000 scientific pages; 4 region types (Formula, Table, Figure, All). Competition site
  • ICDAR 2023 DocLayNet: Hard-split subset of DocLayNet (~80k pages); 11 region classes. Table is one class.

Table Structure Recognition

TSR: Paradigms

The field organizes around four main formulations:

  1. Image-to-Sequence (Im2Seq): Generates a markup token sequence (HTML, OTSL, or LaTeX) from a table image using an encoder-decoder architecture. Some variants are multi-task, sharing a backbone across structure, cell detection, and content decoders. Recent work fine-tunes large multimodal models with reinforcement learning on rendered output quality (Table2LaTeX-RL).
    • Examples: EDD, TableFormer + OTSL, MTL-TabNet, TFLOP, UniTable, SPRINT, Table2LaTeX-RL.
    • Pros: Captures complex spanning patterns naturally; end-to-end trainable; amenable to beam search decoding.
    • Cons: Sequence length scales with table size; attention drift on large or complex tables; LaTeX output is harder to evaluate than HTML.
  2. Object Detection: Treats rows, columns, and/or cells as bounding-box objects detected in a single forward pass. Some models (OmniParser V2) additionally perform text spotting without a separate offline OCR stage, which affects fair comparison against structure-only models.
    • Examples: Table Transformer (DETR-based), GTE, Cycle-CenterNet, GridFormer, OmniParser V1/V2.
    • Pros: Fast single-pass inference; leverages standard detection tooling; produces spatial coordinates directly.
    • Cons: Post-processing required to resolve spanning cells; struggles with dense or borderless tables.
  3. Split-and-Merge / Separation Line: Recovers the cell grid by predicting spatial structure and then assembling it. Approaches vary: some predict separator lines (via segmentation, regression, or query-based detection) and merge spanning cells in a second stage; others directly segment cell regions at the pixel or instance level (TableNet, CascadeTabNet, OG-HFYOLO). The common thread is that cell boundaries are predicted before the grid topology is assembled.
    • Examples: TableNet, CascadeTabNet (Mask R-CNN cell segmentation), SEMv2 (separator line instance segmentation), SPLERGE (projection networks + merge model), TRUST (query-based decoder + vertex merge), LGPMA (soft-mask pyramid supervision), OG-HFYOLO (deformed cell instance segmentation), SepFormer, TABLET.
    • Pros: Grid structure falls naturally from separator or boundary predictions; interpretable intermediate representations.
    • Cons: Two-stage pipelines are sensitive to errors in the first stage; spanning cell resolution adds complexity; instance segmentation variants are slower than detection-only approaches.
  4. Graph / Cell Relationship: Reasons directly over cells rather than lines or sequences. Approaches share the goal of assigning each cell an adjacency structure or logical row/column index, but differ architecturally: GCN methods propagate context over explicit cell-to-cell edges; token-based methods predict adjacency matrices over OCR word tokens; regression-based methods output logical indices per cell in parallel.
    • Examples: TGRNet (GCN + ordinal regression), TabStruct-Net (DGCNN + LSTM), NCGM (multi-modal collaborative blocks), ClusterTabNet (adjacency matrix over OCR tokens), LORE (cascade index regression; 0.45s/image), TableCenterNet (parallel spatial + logical index regression), VertexNet (keypoint-based cell stitching).
    • Pros: Explicit cell-level reasoning; handles complex spanning structures; regression-based variants enable fully parallel inference.
    • Cons: Graph methods depend on upstream cell detection quality; token-based methods require reliable OCR positions; harder to scale to very large tables.

The subtables below are organized by paradigm. Some models implement hybrid approaches and appear under their primary paradigm.

Choosing a Paradigm

The right paradigm depends on what your downstream pipeline needs and the quality of your input:

If you need…Lean toward…
HTML/LaTeX output ready for a parser or LLMIm2Seq
Spatial cell crops to feed a downstream OCR passObject Detection
High boundary precision on bordered/printed documentsSplit-and-Merge
Deformed or warped table images (camera capture)Split-and-Merge (segmentation variants)
OCR-first pipeline that already has word bounding boxesGraph/Cell (token-based, e.g. ClusterTabNet)
Fast parallel inference with logical grid indicesGraph/Cell (regression-based, e.g. LORE)

TSR: Models

Im2Seq

Models that generate a markup token sequence (HTML, OTSL, or LaTeX) using an encoder-decoder architecture. Sequence length scales with table size.

DateNameArtifactsCodeLicenseNotes
2025-12TRiviaTRivia-3BTRiviaApache-2.0 (code); Unknown (weights)Notes. opendatalab. Self-supervised GRPO fine-tuning of Qwen2.5-VL-3B from unlabeled table images; attention-guided QA generation creates training signal without human labels. TEDS 89.88 overall vs. Gemini 2.5 Pro (88.93) and MinerU2.5 (86.82).
2025-09Table2LaTeX-RLLLLHHH/Table2Latex-RLTable2LaTeX-RLApache-2.0Notes. NeurIPS 2025. Qwen2.5-VL-3B fine-tuned with VSGRPO: dual-reward RL combining TEDS-Structure and CW-SSIM on rendered images. Trained on 1.2M arXiv table image-to-LaTeX pairs (SFT) + 5,936 complex tables (RL). TEDS-S 0.9218 on complex tables. Generates LaTeX rather than HTML/OTSL.
2025-03SPRINTNone releasedSPRINTMITNotes. ICDAR 2024. IIT Bombay. ResNet-31 + Multi-Aspect Global Attention (GCA) encoder + 6-layer Transformer decoder. Aggressively downsamples input to 128×128 to produce script-agnostic features; decodes OTSL (6-token vocabulary). TEDS-S 97.55 (PubTabNet), 98.17 (FinTabNet). +11.12% avg TEDS-S over MTL-TabNet on MUSTARD (multilingual). Introduces MUSTARD dataset.
2025-01TFLOPNone releasedTFLOPCC-BY-NC-4.0Notes. IJCAI 2024. Swin + BART + Layout Pointer. Encodes input text bounding boxes; decoder points to text regions via InfoNCE loss. TEDS-S/TEDS: 99.56/99.45 (FinTabNet), 98.38/96.66 (PubTabNet test).
2024-09UniTabNetNone releasedNoneN/ANotes. USTC + iFLYTEK. EMNLP 2024 Findings. Swin + BART + Vision Guider + Language Guider. GriTS-Top 99.43 (PubTables-1M); TEDS-S 94.0 (iFLYTAB). Code promised but not released.
2024-04MuTabNetNone releasedMuTabNetMITNotes. ICDAR 2024. Multi-cell content decoder with bidirectional mutual learning; cross-attention between adjacent cells during decoding. Outperforms non-end-to-end models on PubTabNet without extra annotations.
2024-03UniTableNone releasedunitableMITNotes. Preprint. Linear Projection Transformer + VQ-VAE SSP. Self-supervised pretraining on up to 2M unlabeled tables. 30M (base) / 125M (large). Best reported results on 4 of 5 major benchmarks at publication.
2023-05OTSL / TableFormerNone releasedNoneN/ANotes. IBM, ICDAR 2023. 5-token language with backward-only syntax rules; ~$2\times$ inference speedup over HTML. No code/weights released.
2023-03MTL-TabNetPubTabNet weightsFinTabNet weightsMTL-TabNetApache-2.0Notes. NII Tokyo. VISAPP 2023. ResNet-31 + GCAttention + Shared Decoder + 3 Task Decoders. Joint structure, cell detection, and content in one model. TEDS-Struct 98.79% (FinTabNet), 97.88% (PubTabNet val).
2019-11EDDNone releasedNoneN/ANotes. IBM. ResNet-18 + dual LSTM. Encoder-dual-decoder separating structure from cell content. Introduced with PubTabNet.
2019-03TableBank baselinesNone releasedTableBankApache-2.0Notes. OpenNMT enc-dec. Image-to-HTML tag sequence (12-token vocabulary). Faster R-CNN for detection + OpenNMT for structure.

Object Detection

Models that detect rows, columns, and cells as bounding-box objects in a single forward pass.

DateNameArtifactsCodeLicenseNotes
2025-01VertexNetNone releasedNoneUnknownIJDAR 2025. Keypoint-based TSR: detects cell center points, regresses four vertex positions, then stitches adjacent cells into the grid. F1 86.9% (WTW); TEDS 79.4%.
2024-12TabSniper (TSR)None releasedNoneUnknownNotes. AmEx + Bosch. CODS-COMAD 2024. DETR fine-tuned from PubTables-1M with CIoU loss substitution and long-table split-merge strategy. Deployed pipeline for bank statement transaction extraction. Evaluated on proprietary BankTabNet only; no TEDS/GriTS reported.
2023-09GridFormerNone releasedNoneN/ANotes. Baidu + SCUT. ACM MM 2023. ResNet-50 + Deformable DETR (two-stream row/col decoders). Represents tables as $M \times N$ vertex-edge grids. TEDS-S 97.0% (PubTabNet val); TEDS-S 98.63% (FinTabNet val); F1 94.1% (WTW).
2021-09Table Transformer (Struct)microsoft/table-transformer-structure-recognitiontable-transformerMITNotes. DETR (ResNet-18). 125 object queries. Standard detection baseline for TSR.
2021-09Cycle-CenterNetNone releasedNoneN/ANotes. CenterNet + cycle-pairing module. Cell detection for wild table images. Introduced with WTW dataset.
2020-05GTENone releasedNoneN/ANotes. RetinaNet (ResNet-50-FPN). Constraint loss coupling cell and table detectors. Joint TD + TSR. Also introduces FinTabNet dataset.

Unified document parsers with TSR: Some systems perform TSR as part of a broader end-to-end pipeline (text spotting, KIE, layout). These are tracked on the Document Understanding Page, including OmniParser V1 (CVPR 2024) and OmniParser V2.

Split-and-Merge / Separation Line

Models that predict spatial structure and recover the cell grid in two stages. Early approaches used coarse pixel-wise mask segmentation (TableNet) or cell instance segmentation (OG-HFYOLO); most later work refined this into explicit separator line detection with a learned merge stage for spanning cells.

DateNameArtifactsCodeLicenseNotes
2025-06SepFormerNone releasedNoneN/ANotes. ICDAR 2025. RT-DETR backbone + dual two-stage decoder branches for coarse-to-fine separator regression (single line to line-strip). Eliminates segmentation masks entirely. 25.6 FPS. 98.6% F1 (SciTSR-COMP); 96.8% TEDS-S (PubTabNet); 93.9% F1 (WTW); 93.8% F1 (iFLYTAB).
2025-06TABLETNone releasedNoneN/ANotes. ICDAR 2025. ResNet-18 + FPN + Dual Transformer Encoders (split) + Transformer Encoder (merge). Formulates row/col splitting as 1D sequence labeling; merging as OTSL grid classification. 18 FPS on A100. 98.54 TEDS / 98.71 TEDS-S (FinTabNet test); 96.79 TEDS / 97.67 TEDS-S (PubTabNet val).
2025-04OG-HFYOLODWTAL dataset + codeOGHFYOLOAGPL-3.0Notes. NCHU. YOLOv5 + GOE + HKCF + scale-aware loss + mask-driven NMS. Cell-level instance segmentation (pixel masks) rather than separator line detection; evaluates on deformed table images. Mask mAP@50:95 74.23% (DWTAL-s); 62.38% (DWTAL-l). Introduces DWTAL (28,285 images).
2024-07DTSMNone releasedDTSMUnknownICDAR 2024. SCUT. Text query encoder + adjacent feature aggregator targeting dense tables with high cell counts. Introduces DenseTab dataset (16,575 dense table images).
2024-05SEMv3None releasedNoneN/ANotes. IJCAI 2024. ResNet-34 + FPN + KOR. Keypoint offset regression replaces instance segmentation for separation line detection; O(NM) merge action map. 95.1% F1 (WTW), 89.3% F1 (ICDAR-2019 cTDaR Historical).
2023-05TRACENone releasedNoneN/ANotes. ICDAR 2023. NAVER AI. Single U-Net (ResNet-50) predicts 5 segmentation maps (cell corners + 4 edge directions); bottom-up post-processing assembles cells without explicit separator lines or a separate TD stage. SubTableBank adds per-cell border visibility annotations. Best reported results on ICDAR 2013 and WTW at publication.
2023-03SEMv2None releasedSEMv2UnknownNotes. Pattern Recognition 2024. ResNet-18 + conditional conv. Instance segmentation of separation lines; kernel/feature branch decoupling. Also introduces iFLYTAB dataset. Code exists; license not stated.
2022-08TRUSTNone releasedNoneN/ANotes. Baidu + DUT. ResNet-18 + FPN + Query-Based Splitting (Transformer Decoder) + Vertex-Based Merging (cross-attention). Predicts multi-oriented row/col separators with angle. 97.1% Str-TEDS / 96.2% TEDS (PubTabNet). 10 FPS on A100.
2022-08TSRFormerNone releasedNoneN/ANotes. ACM MM 2022. ResNet-18 + FPN + SepRETR (DETR). Replaces segmentation with direct line regression via two-stage DETR decoder. 97.5% TEDS-S (PubTabNet), 93.4% F1 (WTW).
2022-03RobusTabNetNone releasedNoneN/ANotes. USTC + Microsoft Research Asia. ResNet-18 + FPN + Spatial CNN (split) + Grid CNN (merge) + CornerNet-FRCN detector. Spatial CNN propagates context across blank regions for robust separator prediction. 99.3% F1 (SciTSR), 97.0% TEDS-S (PubTabNet val).
2021-05LGPMANone releasedDAVAR-Lab-OCRApache-2.0Notes. ICDAR 2021 Best Industry Paper. Hikvision + Zhejiang Univ. ResNet-50 + FPN + Mask-RCNN + LPMA + GPMA. Dual pyramid soft-mask supervision. TEDS 94.6 / TEDS-Struct 96.7 (PubTabNet val); F1 98.8 (SciTSR). Code only; no pretrained weights.
2020-04CascadeTabNetNoneCascadeTabNetMITNotes. CVPR Workshops 2020. Cascade Mask R-CNN + HRNet backbone. End-to-end TD + TSR: detects table regions then recovers cell structure via instance segmentation. Iterative transfer learning and document-specific augmentation. Also listed in TD: Models. F1 0.9252 (ICDAR 2013 TD).
2020-01TableNetNone releasedNoneN/ANotes. TCS Research. ICDAR 2019. VGG-19 + Dual FCN Decoders. Early precursor: pixel-wise table and column segmentation with rule-based row extraction from OCR. Releases Marmot Extended column annotations. F1 0.9662 (TD) / F1 0.9151 (TSR) on ICDAR 2013.
2019-09SPLERGENone releasedNoneN/AICDAR 2019, pp. 114-121. Adobe Research. DOI. CNN Row/Col Projection Networks + Merge Model. Projection pooling for global row/col split prediction; grid pooling for spanning cell merge decisions. Best reported results on ICDAR 2013 at publication.
2017-11DeepDeSRTNone releasedNoneN/AICDAR 2017, pp. 1162-1167. DFKI. DOI. Faster R-CNN (detection) + FCN (row/col segmentation). Early joint TD + TSR with deep learning. See also TD: Models.

Graph / Cell Relationship

Models that assign each cell an adjacency structure or logical row/column position by reasoning over inter-cell relationships. Approaches differ architecturally: GCN methods propagate context over explicit cell-to-cell edges; token-based methods predict adjacency matrices over OCR word tokens; regression methods directly output logical indices in parallel without a graph.

DateNameArtifactsCodeLicenseNotes
2025-04TableCenterNetNone releasedTableCenterNetApache-2.0Notes. One-stage parallel regression for both spatial coordinates and logical row/col indices per cell simultaneously. Synergistic shared-feature + task-specific decoding. Best reported results on TableGraph-24K at publication.
2024-02ClusterTabNetNone releasedclustertabnetApache-2.0Notes. ICDAR 2024. SAP. Transformer Encoder + optional patch CNN. Predicts $n \times n$ adjacency matrix per target (tables, rows, columns, cells, headers) via $\sigma(QK^T)$ + BCELoss over OCR word tokens. Covers TD and TSR in one model. ~5M non-embedding params; rotation-robust. TD: AP 0.989 (PubTables-1M). TSR (4-class): AP 0.931 (PubTables-1M).
2024-01LORE++None releasedNoneN/ANotes. Pattern Recognition 2025. Follow-up to LORE. Adds MAE + Logical Distance Prediction pre-training; 60% of training data matches LORE at 100%. 0.43s/image inference.
2023-03LOREWTW checkpointPubTabNet checkpointWireless checkpointLORE-TSRApache-2.0Notes. AAAI 2023. Zhejiang University + Alibaba DAMO. DLA-34 + CenterNet + Cascade Self-Attention Regressors. Regresses logical row/col start-end indices per cell in parallel. 99.3 F1 (SciTSR-comp); 95.1 F1 (WTW); 98.1 TEDS (PubTabNet, 20k training). 0.45s/image vs. 14.8s for EDD.
2021-11NCGMNone releasedNoneN/ANotes. Tencent YouTu Lab. CVPR 2022. ResNet-18 + CMHA-based ECE (intra-modality) + CCS (inter-modality); 3 collaborative blocks over geometry, appearance, and content. SciTSR-COMP Setup-B: F1 99.0%; strong gains on distorted tables.
2021-06TGRNetCheckpointsTGRNetApache-2.0 (code); Unknown (data/weights)Notes. ICCV 2021. Segmentation + GCN. Two-branch: cell detection via segmentation, logical index prediction via ordinal regression. Also introduces TableGraph-24K (full 350K described in paper not publicly released). Pretrained checkpoints on CMDD, ICDAR13, ICDAR19-cTDaR, TableGraph-24K.
2020-10TabStruct-NetNone releasedTabStructNetUnknownNotes. IIIT Hyderabad. ECCV 2020. ResNet-101 + FPN + Mask R-CNN + DGCNN + LSTM. Joint cell detection and row/column adjacency prediction; alignment loss enforces grid constraints. F1 0.906 (ICDAR-2013), 0.920 (SciTSR), TEDS 0.901 (PubTabNet). Code available; license not stated.

Pipeline references: Some models appear only in downstream pipeline integrations: TableMaster (used in MinerU for TSR; see OCR Page) and StructEqTable (also referenced by MinerU for table-to-LaTeX conversion).

TSR: Datasets

Datasets with table structure annotations (row/column/cell/spanning cell labels). Grouped by the most permissive use permitted: commercial, research / non-commercial, and not available / restricted.

Commercial Use

Training a private, for-profit model is permitted with minimal obligations.

DatasetTablesDomainAnnotationClassesEval SplitLicenseNotes
SynFinTabs (2024)100k syntheticSynthetic (browser-rendered HTML/CSS; no real document images; styled after UK Companies House filings)Auto (programmatic rendering)Rows, columns, cells; HTML + JSON + CSV + bbox annotationsYesMITNotes. Queen’s University Belfast. Also introduces FinTabQA (LayoutLM fine-tune).
SynthTabNet (2022)600kSynthetic (browser-rendered HTML/CSS; no real document images)Auto (HTML + cell bboxes)Rows, columns, cells (spanning)YesCDLA-Permissive-1.0Notes. Introduced with TableFormer (Nassar et al., CVPR 2022). Four styled subsets (FinTabNet, marketing, PubTabNet, sparse), 150k each. 80/10/10 splits. ~37 GB total.

Research / Non-Commercial

Training a free, open-weight non-commercial model is permitted.

DatasetTablesDomainAnnotationClassesEval SplitLicenseNotes
PubTables-v2 (2025)548k tables (9,172 documents; 9,492 multi-page tables)Scientific (PubMed, 2023-2025)Auto (canonicalized) + full-page contextRows, columns, cells (spanning); multi-page annotationsYesCDLA-Permissive-2.0 (annotations); underlying PMCOA images have mixed per-article licensesNotes. Kensho. First large-scale multi-page TSR dataset; adds full-page context and cross-page table annotations.
CISOL (2025)844 table images; 120k+ cell instancesConstruction industry (steel ordering lists, civil engineering)HumanCells + TD bboxesYes (two tracks: TD+TSR end-to-end; TSR-only cropped)CC-BY-4.0Notes. WACV 2025. Real-world documents from 24 construction projects (2015-2023). Industrial domain absent from prior TSR benchmarks.
Table2LaTeX-RL (2025)~1.2M pairsScientific (arXiv rendered tables)Auto (LaTeX source + rendered image)LaTeX sequencesYesApache-2.0 (stated); underlying arXiv papers have mixed licenses including NC variants; same provenance concern as SciTSR/TableBankNotes. Large-scale arXiv-derived table image to LaTeX training corpus. Introduced with RL-based table-to-LaTeX model (NeurIPS 2025).
DWTAL (2025)28,285 (DWTAL-s: 8,765; DWTAL-l: 19,520)Diverse (deformed/warped table images; derived from TAL-OCR + WTW + 150 collected)Synthetic (wave + cylindrical warping generator) + Human (150 offline images)Cells (pixel-level instance segmentation masks)YesDWTAL-l: CC-BY-NC-4.0 (WTW-derived); DWTAL-s: Unknown (TAL-OCR license unverified); HuggingFace Apache-2.0 claim does not reflect upstream NC restrictionNotes. NCHU. Dataset focused on deformed and warped real-world tables with fine-grained segmentation annotations. Introduced alongside OG-HFYOLO.
UoS_Data_Rescue (2025)1,113 logbooks (594k+ cells)Historical (scientific logbooks, 19th-20th century)HumanCells (row/col structure + transcription)YesCC-BY-4.0IJDAR 2025. University of Southampton. Historical scientific logbooks digitized for climate/environmental research. Handwritten and mixed printed/handwritten tables. No arXiv preprint.
MMSci (2025)~52k TSR samplesScientific (multimodal figures + tables)AutoCells + structureYesCC-BY-NC-SA-4.0 (inherited from SciGen source via ShareAlike)Notes. Large-scale multimodal science dataset; TSR component derived from arXiv papers. Also includes 12K instruction-tuning and 3,114-sample evaluation benchmark.
ENTRANT (2024)~6.7M tables (~330k SEC filings)Financial (SEC EDGAR, 2013-2021)Auto (extracted from XLSX)Cells (JSON bi-tree: positional + hierarchical attributes)YesCC-BY-4.0Notes. IIT Demokritos. Text/JSON format only; no table images. Structural content source for synthetic image generation pipelines (cf. SynFinTabs). ~20 tables/filing avg, ~25 rows, ~5 cols.
WikiDT (2024, TSR side)159,905 table cropsWikipedia screenshotsAuto (Wikipedia markup)Rows, columns, cellsYesCC-BY-SA-3.0Notes. Also includes TD and QA/SQL annotations; TSR side tracked here. See TD: Datasets for the TD side and Table Understanding for the QA side.
MUSTARD (2024)1,428 tablesMultilingual (11 Indic scripts + Chinese; scanned and scene-text)HumanCells (OTSL sequences)YesMITNotes (SPRINT paper). IIT Bombay. ICDAR 2024. 1,214 Indic + 214 Chinese/other tables from magazines. Released alongside SPRINT (script-agnostic TSR model). First large-scale multilingual TSR dataset covering Indic scripts.
TabRecSet (2023)38.2kWild (scanned, camera, spreadsheet, bilingual EN/ZH)Human (cell polygons + logical structure + cell text)Cells (TD + TSR + TCR)YesCC-BY-SA-4.0Notes. 32k images; 80/20 split. First large-scale bilingual (English + Chinese) end-to-end table recognition dataset. Also includes TD annotations.
ComFinTab (2022)10k (6k Chinese + 4k English)Financial (compound tables from Chinese listed-company annual reports)Human (cell bboxes, row/col indices, text, cell type, cell linking)Cells (compound spanning; TH/LH/DA/OT + linking)Yes (4.5k/1.5k Chinese; 3.2k/0.8k English; company-level split)CC-BY-NC-SA-4.0Notes. DAVAR Lab (Hikvision/ShanghaiTech/ZJU). ACM MM 2022. Over 70% compound tables. Introduces table item extraction task and Tree-F1-Score metric. CTUNet code released via DAVAR-Lab-OCR (Apache-2.0). Dataset available via gated application; see ComFinTab page.
TableGraph-24K (2021)24k (350K described in paper; only 24K publicly released)ScientificAuto (graph: cell bboxes + logical indices)Cells (spatial + logical row/col indices)YesNo license stated (annotations/code); underlying images derived from arXiv LaTeX papers with mixed per-article licensesNotes. Derived from TABLE2LATEX-450K (rendered arXiv LaTeX tables). Same mixed-license provenance as SciTSR/TableBank. Full TableGraph-350K has not been released.
GloSAT (2021)500 page images (one table per page)Historical meteorological logbooks (UK Met Office, NOAA, Univ. of Reading)Human (VOC2007 + ICDAR cTDaR XML)Headings, headers, table body, coarse segmentation cellsYesBSD-3-ClauseNotes. HIP@ICDAR 2021. University of Southampton. Enhanced annotations for TSR in historical documents; adds coarse cell groupings following original ruling lines.
WTW (2021)14.5kWild (photos, scans, documents)Human (cell coordinates + row/col structure)CellsYesCC-BY-NC-4.0Notes. Tables in natural scenes with deformation, bending, occlusion.
PubTables-1M (2021)~947k tables (from 460k pages)ScientificAuto (canonicalized bboxes)Rows, columns, cells (incl. blank), spanning cellsYesCDLA-Perm-2.0 (annotations); underlying PMCOA images have mixed per-article licensesNotes. Canonicalized annotations fix PubTabNet oversegmentation. Includes projected row header labels.
TSRD + TCRD (2021)46K + 38KScientific (arXiv CS preprints, LaTeX-rendered to JPG)Auto (LaTeX source compiled to image + structure sequences)Structure (TSRD) + content (TCRD)YesCC-BY-NC-SA-4.0 (stated on CodaLab competition page)Notes. ICDAR 2021 competition datasets from IIT Gandhinagar. TSRD: 46K table images; TCRD: 38K. Images are programmatically rendered from arXiv CS LaTeX sources. Access requires CodaLab login; no open mirror available.
FinTabNet (2020)~113kFinancialAuto (HTML)Rows, columns, cells (spanning)YesCDLA-Permissive-1.0 (annotations); underlying annual report images are from copyrighted S&P 500 corporate filingsNotes. Complex, dense financial tables from S&P 500 SEC annual reports. Introduced with GTE.
Marmot Extended (2020)509English scientific (English subset of Marmot)Human (column bounding boxes)ColumnsNoResearch-only (inherits Marmot restriction); Unknown (annotation license)Notes. TCS Research. Column bounding box annotations for 509 English documents from the Marmot dataset, released alongside TableNet. Column-level only; no row or cell annotations. Google Drive.
PubTabNet (2019)568kScientificAuto (HTML structure + cell content)Rows, columns, cells (spanning)YesCDLA-Permissive-1.0 (annotations); underlying PMCOA images have mixed per-article licensesNotes. First large-scale TSR dataset. Introduces EDD model and TEDS metric.
SciTSR (2019)15kScientific (arXiv PDF table images)Auto (LaTeX source → cell adjacency graph)Cells (adjacency)YesMIT (annotations/code); underlying arXiv PDF images have mixed per-article licensesNotes. Focuses on complex spanning structures. SciTSR-COMP subset: 716 complex tables. PDF images sourced from arXiv papers; same mixed-license provenance as PubTabNet/TableBank.
TableBank (2019)145k (structure split)DiverseWeak (HTML-like tag sequence)CellsYesResearch-only (data)Notes. Weak supervision from Word/LaTeX sources. Also provides TD annotations (see above).

Not Available / Restricted

Described in a publication but not publicly downloadable. Included here because the papers provide useful methodological details and the data may become available in the future.

DatasetPagesDomainAnnotationClassesEval SplitLicenseNotes
Arabic TSR (2024)7,300Arabic documentsHumanCellsUnknownUnknownNo verified public release or arXiv preprint found. Dataset referenced in literature; availability unconfirmed.
Chinese Financial TSR (2024)~1.5M tables (105,600 bordered sampled for synthesis)Financial (Chinese annual reports)Auto (extracted from reports)Rows, columns, cellsUnknownUnknownarXiv 2404.11100. ICDAR 2024. Authors state intent to publicly release; no download available as of 2026-03. Includes 2,290-table manually verified benchmark.
BankTabNet (2024, TSR side)5,165 tablesBank statements (transaction tables)Human (K-alpha 0.99)Cells (rows, columns, spanning)UnknownProprietaryNotes. AmEx internal; PII-masked; not released. Same paper as TD side. TD side tracked in TD: Datasets.
DenseTab (2024)16,575Dense/complex (multi-row/col spanning); image source undisclosedHumanCells (complex spanning)YesUnknownICDAR 2024. Google Drive. Files are technically downloadable but moved here due to two blockers: (1) no license stated anywhere in the repo or README; (2) image provenance is completely undisclosed; the source documents are unknown and the full paper is paywalled. Cannot assess rights or suitability.
SubTableBank (2023)9,717 imagesFinancial and scientific documents (+ some TableBank images)Human (cell bboxes + per-edge visibility flags: explicit vs. implicit borders)Cells, border visibilityYes (7,783 / 971 / 963 train/val/test)UnknownNotes. NAVER AI. In-house dataset used to train TRACE. Per-edge visibility annotations distinguish explicit (visible) from implicit (invisible) cell borders. Partial public release promised in ICDAR 2023 preprint; no public URL found as of 2026-04.
iFLYTAB (2023)17.3kDiverse (digital + camera); digital document sources unspecifiedHuman (cell polygons + row/col info polygons)Rows, columns, cellsUnknownUnknownNotes. Introduced with SEMv2. No stated license; digital image provenance unspecified; originated from an iFLYTEK competition (restricted redistribution implied). USTC file-sharing download link may no longer be live.
cTDaR TrackA TSR Annotations (2022)600 pages (modern)Modern documents (ICDAR 2019 cTDaR TrackA)Human (row/col separation lines + cell bboxes)Rows, columns, cellsYes (follows cTDaR 600/240 modern train/test split)UnknownNotes. USTC + Microsoft Research Asia. Structure annotations for the 600 modern images from ICDAR 2019 cTDaR TrackA, used to train RobustTabNet’s TSR module. Public release stated in the paper; no repository or download link identified as of 2026-04.
ICDAR 2017 POD TSR Supplement (post-2017)549 train + 243 test table cropsScientific (CiteSeer; subset of ICDAR 2017 POD)Human (cell polygons + adjacency XML)Cells (polygon coords + adjacency relations)YesUnknownPost-hoc TSR annotations added to a subset of ICDAR 2017 POD table regions by the CIAS group (PKU). Cell polygon coordinates and adjacency neighbors attributes in XML; no row/col span indices. No stated license. GitHub.

TSR: Benchmarks

Evaluation-only sets used to compare TSR systems. Not suitable for training.

BenchmarkTablesDomainAnnotationMetricLicenseNotes
Benchmarking PDF Parsers (2026)451 tables (100 pages)Synthetic PDFs from arXiv LaTeX sourcesHuman + LLM-as-judgeLLM-as-judge scoring (Pearson r=0.93 vs. human)MIT (code); CC-BY-SA-4.0 (data)Notes. Evaluates 21 PDF parsers using LLM-as-a-judge, validated against 1,554 human ratings. LLM metrics substantially outperform TEDS (r=0.68) and GriTS (r=0.70).
OmniDocBench (2025)1,355 pages (subset of 9 doc types)Diverse (academic, financial, newspaper, handwritten, etc.)HumanTEDS / HTML+LaTeX for tables; NED for textCustom (research only; non-commercial per dataset card)Notes. CVPR 2025. End-to-end document parsing benchmark; table sub-task uses HTML+LaTeX annotations for TEDS-style evaluation. Covers 9 document types and 3 language settings.
Benchmarking TE (Soric et al.) (2025)37k (Table-arXiv + Table-BRGM); 56k (PubTables-Test)Scientific (LaTeX preprints, geological reports)Auto + HumanGriTS-Top; GriTS-Cont; TEDS; end-to-end P/R metricsMITNotes. End-to-end benchmark covering TD, TSR, and full TE pipeline. Introduces Table-arXiv (36k samples, all arXiv domains) and Table-BRGM (124 tables from French geological reports) alongside PubTables-Test. Formally justified metrics propagate TD errors into TSR scores. Nine methods evaluated; models trained on PubTables-1M degrade substantially on heterogeneous data.
DocPTBench (2025)1,381 images (Original + Photographed + Unwarped)Camera-captured and digital (phone photos of printed documents; 8 translation directions)HumanEdit distance (parsing); BLEU/chrF/METEOR (translation)Apache-2.0Notes. Unified parsing + translation benchmark on photographed documents. Expert OCR models degrade ~25% on photographed vs. digital; general MLLMs ~18%.
CC-OCR (2025)7,058 imagesDiverse (4 tracks: multi-scene text, multilingual OCR, document parsing, KIE)HumanF1 (text); NED (parsing); TEDS (tables); field F1 (KIE)MITNotes. ICCV 2025. 4-track OCR benchmark; document parsing track is most relevant to TSR. 39 subsets, 10 languages. Evaluates 9 LMMs including GPT-4o, Gemini, Qwen2-VL.
RD-TableBench (2024)1,000Diverse (financial, scientific, scanned, handwritten, multilingual)HumanNeedleman-Wunsch HTML array alignment with Levenshtein partial creditCC-BY-NC-ND-4.0Released by Reducto. Targets complex real-world tables; eval-only, no training split. Partial public release to prevent contamination. HuggingFace. Used in Nemotron Parse evaluation.
ICDAR 2021 SLP Task B (2021)9,064 (final eval)Scientific (PubTabNet)Auto (PubTabNet)TEDS (Simple/Complex/All)Apache-2.0 (eval harness); CDLA-Perm-1.0 (annotations); underlying PDFs mixedNotes. IBM. 30 teams. Top result: 96.36 TEDS all (Davar-Lab-OCR). Note: <b> bold tags excluded from scoring due to a data preparation bug; all reported numbers are slightly inflated. Prior EDD baseline: 91 TEDS. GitHub.
ICDAR 2019 cTDaR TSR (2019)~600 (modern) + ~600 (archival)Modern documents + historical handwrittenHumanAdjacency F1UnknownGao et al., ICDAR 2019. DOI. Three TSR tasks: (1) structure with given regions; (2) structure without given regions; (3) both on archival handwritten documents. SEMv3 reports 89.3% F1 on the archival track. Same competition as the ICDAR 2019 cTDaR TD benchmark; see TD section for detection-side details.
SciTSR-COMP (2019)716Scientific (arXiv CS PDF images)Auto (LaTeX source)Adjacency F1MIT (inherits from SciTSR)Complex-table subset of SciTSR: tables with at least one spanning cell. Widely used as the primary eval target for graph-paradigm and split-and-merge models (LORE: 99.3 F1; NCGM: 99.0 F1; RobustTabNet: 99.3 F1; TRUST: reported on val). SciTSR-COMP results are not comparable to full SciTSR results.
ICDAR 2013 TSR (2013)150 tables (238 pages)Government documents (EU + US)Human (cell bboxes + row/col spans; adjacency XML)Adjacency F1UnknownGobel et al., ICDAR 2013. Same documents as the ICDAR 2013 TD benchmark; separate TSR ground truth with cell bounding boxes, row/column span indices, and adjacency relations between neighboring cells. The community’s sole public TSR benchmark from 2013 through 2018; DeepDeSRT (2017) and SPLERGE (2019) both evaluate here. A corrected version is on HuggingFace (CDLA-Perm-2.0 for new annotations).

Metrics

Detection Metrics

Table detection uses the same metrics as general layout detection. See the Layout Page metrics section for detailed explanations.

MetricWhat it measuresNotesTools
mAP @ IoUCOCO-style detection accuracyStandard for PubTables-1M, ICDAR 2017 POD. Thresholds vary: $\text{AP}@[.50:.95]$, $\text{AP}@50$, $\text{AP}@75$.pycocotools
Weighted F1 (IoU $\geq$ 0.6)Detection quality with class weightingUsed in ICDAR 2019 cTDaR competition.
Area-based P/R/F1Pixel-area overlap between predicted and GT regionsUsed by TableBank. Less standard than mAP; may obscure object-level errors like merging adjacent tables.Custom

Structure Metrics

MetricParadigmNotesTools
TEDS (Tree Edit Distance Similarity)Im2Seq, Object DetectionStandard for PubTabNet and FinTabNet. Introduced by PubTabNet. Normalized tree-edit distance between predicted and ground-truth HTML table trees: $\text{TEDS}(T_a, T_b) = 1 - \frac{\text{EditDist}(T_a, T_b)}{\max(\lvert T_a \rvert, \lvert T_b \rvert)}$. Often reported as Simple / Complex / All splits.Custom; teds (unofficial)
TEDS-S (TEDS-Structure)Im2Seq, Object DetectionStructure-only variant: cell content is stripped from both prediction and ground truth before computing the tree edit distance. Isolates layout prediction from OCR quality. Widely reported in recent work as the primary comparison number.Custom; teds (unofficial)
GriTS (Grid Table Similarity)Object DetectionGrid topology correctness measuring row/column spanning alignment. More robust to empty cell variations than TEDS. Introduced by PubTables-1M. Three variants: $\text{GriTS}_{\text{Top}}$ (topology), $\text{GriTS}_{\text{Cont}}$ (content), $\text{GriTS}_{\text{Loc}}$ (location).Custom (PubTables-1M repo)
Adjacency F1Graph ReconstructionCorrectness of adjacent cell pair relationships. Pre-TEDS metric. Known to under-react to structural errors (row/column misalignment) and over-react to content perturbations. PubTabNet demonstrated these failure modes. Largely superseded.Custom
BLEUIm2SeqN-gram overlap on generated tag sequences. Used by TableBank for structure recognition evaluation (4-gram BLEU). Less sensitive to structural errors than TEDS; largely superseded.nltk, sacrebleu

TEDS Variants

TEDS measures how many tree edit operations (node insertion, deletion, relabeling) are needed to transform the predicted HTML tree into the ground-truth HTML tree, normalized by the size of the larger tree. A score of 1.0 is a perfect prediction; 0.0 means the trees share no recoverable structure.

Two variants appear in the literature:

  • TEDS (full): Includes both the structural token sequence and the OCR text content of each cell. A structurally correct prediction with wrong cell text is penalized. This requires a working OCR component and makes cross-system comparison harder unless OCR is held constant.
  • TEDS-S (structure only): Strips cell content from both prediction and ground truth before computing tree edit distance. This isolates the layout prediction task from OCR quality and is the more commonly reported number in recent work.

TEDS is often reported split across Simple (few spanning cells, rectangular grids) and Complex (multi-span, hierarchical headers) subsets. A model that performs well on Simple but poorly on Complex is likely struggling with the spanning cell case specifically.

One known limitation: TEDS is proportional, so a structural error that shifts every cell in a large table still yields a moderate score, while the same error on a small table is more severely penalized. This makes cross-table-size comparisons unreliable.

GriTS Variants

GriTS frames TSR evaluation as comparing the predicted grid structure directly, rather than via a tree edit distance on HTML. Given predicted and ground-truth grids, the metric computes an alignment score using bipartite matching. Three variants measure different aspects:

  • GriTS-Top (topology): Checks only that each cell occupies the correct row/column span position. Ignores both spatial coordinates and text content.
  • GriTS-Cont (content): Extends topology evaluation to also require correct cell text content, using a string similarity score per matched pair.
  • GriTS-Loc (location): Extends topology to also require correct bounding box coordinates, using a spatial IoU score per matched pair.

GriTS is more robust than TEDS to annotation artifacts like empty cell representation (the oversegmentation issue that PubTables-1M corrects relative to PubTabNet). When GriTS and TEDS disagree substantially, the discrepancy often traces to empty cell handling. GriTS-Top is the most commonly reported single number for object detection-paradigm models (Table Transformer, ClusterTabNet). For Im2Seq models evaluated on PubTabNet, TEDS-S remains the community standard.


Surveys

DateTitleVenueKey ContributionNotes
2024A Survey for Table Recognition Based on Deep Learning (Yu et al.)Neurocomputing 2024Reviews TD and TSR methods through ~2024; covers DL-based methods from object detection through transformer paradigms; updated dataset and benchmark comparison tablesDOI. Neurocomputing vol. 600, article 128154. Xidian University. No arXiv preprint. Post-Kasem coverage extension; provides updated taxonomy of methods and dataset landscape through the large-scale transformer era.
2022-11Deep Learning for TD and TSR: A Survey (Kasem et al.)ACM CSUR 2024Reviews 19 datasets, heuristic/ML/DL taxonomy, comparative TNCR experiments across HRNet, ResNeSt, and Dynamic R-CNN backbonesNotes. arXiv 2211.08469. GitHub repo for ongoing tracking.

Comparative Studies

Cross-architecture evaluations that benchmark models from multiple paradigms on the same datasets.

YearPaperModels ComparedDatasetsKey FindingNotes
2022Kasem et al. (Deep Learning for TD and TSR: A Survey, ACM CSUR 2024)HRNet, ResNeSt, Dynamic R-CNN backbonesTNCR + 18 datasets reviewedReviews heuristic/ML/DL taxonomy; comparative backbone experiments show Dynamic R-CNN competitive across TNCR table typesNotes. arXiv 2211.08469.

To Investigate

Papers and resources identified as likely relevant but not yet fully reviewed.

TD: Methods

ReferenceTitleWhy Relevant
Siddiqui et al., IEEE Access 2018 (10.1109/ACCESS.2018.2848541)DeCNT: Deep Deformable CNN for Table DetectionDeformable convolution applied specifically to table detection; transfer learning across domains; surfaced via DocParser references. No arXiv.

TSR: Methods

ReferenceTitleWhy Relevant
Qasim et al., ICDAR 2019Rethinking Table Recognition Using Graph Neural NetworksGNN-based table structure parsing from rendered document images; direct TSR predecessor cited by DocParser. No arXiv found; DOI: 10.1109/ICDAR.2019.00028.

  • Document Layout Analysis: General-purpose layout detection models, many of which detect tables as one region class.
  • Reading Order Prediction: Determining the logical reading sequence of detected regions, including table placement in document flow.
  • OCR: Text recognition pipelines; MinerU integrates TableMaster and StructEqTable for table extraction.
  • Document Understanding: End-to-end systems that combine detection, structure recognition, and content extraction.
  • Tables: Understanding, Reasoning, and LLM-era Evaluation: Benchmarks and datasets for table QA, visual table reasoning, and LLM/VLM evaluation on table content. Out of scope for this page; tracked separately as a working investigation cluster.