Navigation
Breadcrumb

Table Structure Recognition

Tracking models, datasets, and metrics for parsing the internal structure of tables in documents.

Table of Contents

Disclaimer: This page covers models and datasets for parsing the internal grid structure of tables (rows, columns, spanning cells). For detecting where tables are on a page, see the Layout Page. For end-to-end document understanding, see the Document Understanding Page.

Overview

Table Structure Recognition (TSR) is the task of recovering the logical grid of a detected table region: identifying rows, columns, spanning cells, and (optionally) header vs. body roles. It typically operates on a pre-cropped table image produced by a layout detection model.

The field splits into two main formulations:

  1. Image-to-Sequence (Im2Seq): Generates a markup representation (HTML or OTSL) token by token using an encoder-decoder architecture.
    • Examples: EDD (PubTabNet), TableFormer + OTSL.
    • Pros: Captures complex spanning patterns naturally; amenable to beam search decoding.
    • Cons: Sequence length scales with table size; attention drift on large tables.
  2. Object Detection: Treats rows, columns, and cells as bounding-box objects detected in a single forward pass.
    • Examples: Table Transformer (DETR-based, from PubTables-1M).
    • Pros: Fast; leverages standard detection tooling; produces spatial coordinates directly.
    • Cons: Post-processing needed to resolve spanning cells; struggles with dense or borderless tables.

TSR: Models

DateNameTypeKey ContributionNotesLicense
2023-05OTSLModelOne-shot parsingNotes. 5-token language with backward-only syntax rules for efficient autoregressive TSR.-
2020-10Table TransformerModelDETR for TablesMicrosoft. The standard baseline model.MIT

TSR: Datasets

DateNamePages/TablesDomainKey ContributionNotesLicense
2021-09PubTables-1M~1M tablesScientificCanonical/structureNotes. Canonicalized annotations; fixes PubTabNet oversegmentation.CDLA-Perm-2.0
2020-12FinTabNet113k tablesFinancialComplex, dense financial data.CDLA-Perm-1.0
2019-11PubTabNet568k tablesScientificImage-to-HTMLNotes. First large-scale TSR dataset. Introduces TEDS metric.CDLA-Sharing-1.0
2019-09SciTSR15k tablesScientificStructureGitHub. Complex spans.Apache-2.0
2019-03TableBank417k tablesDiverseDetection + RecognitionNotes. Weak supervision from Word/LaTeX sources.Apache-2.0
2012-03Marmot2,000 pagesChinese e-books + English scientificTable detectionFang et al., DAS 2012. DOI. ~1:1 Chinese/English split; 50% pages contain tables, 50% hard negatives. Custom XML format. No official splits. PKU.Research-only

TSR: Metrics

MetricWhat it measuresNotes
TEDS (Tree Edit Distance)Similarity between predicted HTML structure and ground truth.Standard for PubTabNet/FinTabNet. Introduced by PubTabNet.
GriTS (Grid Table Similarity)Grid topology correctness (row/col spanning alignment).More robust to empty cell variations. Introduced by PubTables-1M.