Navigation
Breadcrumb

Reading Order Prediction

Tracking models, datasets, and methods for determining the logical reading sequence of detected document regions.

Disclaimer: This page covers models for predicting the logical reading sequence of detected regions. For detecting where regions are on a page, see the Layout Page. For parsing table internal structure, see the TSR Page.

Overview

Reading order prediction determines the logical sequence in which detected regions should be read. While it depends on layout detection for region proposals, it is a distinct task with its own models, datasets, and evaluation challenges.

The core difficulty is that reading order is not purely spatial. Multi-column layouts, sidebars, footnotes, and floating elements all require understanding the logical flow of a document rather than just scanning top-to-bottom, left-to-right.

Note: We are actively researching this area. Expect significant updates to this section.


Reading Order: Models

DateModelMethodCodeLicenseNotes
2023-12SuryaSegFormerVikParuchuri/suryaGPL-3.0
2021-08LayoutReaderSeq2SeqGitHubNon-CommNotes. LayoutLM encoder + pointer-network decoder. ReadingBank dataset (500k pages).

Reading Order: Datasets

DateNamePagesDomainKey ContributionLicenseNotes
2021-08ReadingBank500kDiverse (born-digital)First large-scale reading order benchmarkApache-2.0Notes. Auto-extracted from DocX XML metadata via color-based watermarking. English only.
2015-10ENP528Historical NewspapersPAGE-XML with reading orderUnknownReading order annotations in PAGE-XML format. See Layout notes.

Reading Order: Metrics

MetricWhat it measures
BLEU (Page-level)N-gram overlap between predicted and ground-truth reading order sequence. Used by LayoutReader.
ARD (Average Relative Distance)Positional displacement between predicted and ground-truth element positions. Penalizes omissions.