Tables: Detection & Structure Recognition

Tracking models, datasets, and metrics for detecting tables in documents and parsing their internal structure.

Hunter Heidenreich

February 13, 2026

Document Layout Analysis

Tracking the evolution of document layout detection: models, datasets, benchmarks, and metrics.

Hunter Heidenreich

October 27, 2025

Modern OCR for the Large Language & Vision Model Era

A reference page for all things Optical Character Recognition (OCR) using Large Language & Vision Models

Ben Elliott

June 30, 2025

Post-train a Model to Fish

We demonstrate how a specialized 25B parameter Mistral model, post-trained on domain-specific data, can outperform Google's Gemini 2.5 Flash by double-digit margins on insurance loss run extraction tasks.

Hunter Heidenreich

December 15, 2024

LLM Calibration and Confidence Estimation

Explore the critical challenge of uncertainty quantification in large language models. Learn about confidence estimation techniques, calibration metrics like ECE and MCE, and practical methods to improve model reliability from logit-based approaches to ensemble methods and post-hoc calibration.

Hunter Heidenreich

October 27, 2024

How to Read a Research Paper

A practical guide to reading and understanding research papers, with strategies for navigating academic literature.

Hunter Heidenreich

March 25, 2024

Tables: Understanding, Reasoning, and LLM-era Evaluation

Tracking benchmarks and datasets for table understanding, visual table QA, and LLM/VLM-era table evaluation. Scope: downstream reasoning and multimodal understanding over tables, not structural recognition.

Hunter Heidenreich

March 5, 2024

A Survey of Multimodal LLMs (2021-2024)

A comprehensive survey of multimodal large language models from 2021 to 2024, covering encoder-only models, encoder-decoder architectures, decoder-only models, and specialized applications for documents and screens.

Hunter Heidenreich

March 15, 2023 April 2, 2026

Reading Order Prediction

Tracking models, datasets, and methods for determining the logical reading sequence of detected document regions.

Hunter Heidenreich

March 14, 2023

Document Understanding & Visual Question Answering

Tracking models and benchmarks for document understanding, visual information extraction, and document VQA.

Pagination

Model	Claims		Policy
Model	Numeric	Textual	Numeric	Textual
Gemini 2.5 Pro	82%	75%	81%	80%
GPT-5	56%	52%	86%	75%
Gemini 3 Pro Preview	83%	77%	83%	81%
GPT-4.1	81%	79%	81%	77%
Bevaya	95%	94%	88%	85%