DocumentUnderstanding
Research papers and code on information extraction from image/pdf
Install / Use
/learn @bikash/DocumentUnderstandingREADME
Information extraction from Image using Deep learning
Research paper and code on information extraction from image/pdf
Layout Analysis in Document/Image:
Document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order.Detection and labeling of the different zones (or blocks) as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis.(https://en.wikipedia.org/wiki/Document_layout_analysis)

- PubLayNet
- PRIMA
- HJDatasets (Historical Japanese Documents with Complex Layouts)
- Newspaper Navigator
- TableBank
- DocBank
- German-Brazilian Newspapers (GBN) Dataset
Tools:
INVOICE EXTRACTION
RECEIPT DATA
Datasets
- TableBank
- DocBank
- FUNSD
- RVL-CDIP
- SROIE
- Document Visual Question Answering
- HTR Dataset ICFHR 2016
- Tobacco3482
- PubLay
- Tobacco800 Complex Document Image Database and Groundtruth
- NIST Forms
Code:
- Layoutlm
- Graph Convolution on Structured Documents
- Graph Matric
- Feature Extraction from Graph
- Extract data from Invoice
- CascadeTabNet
- Tabulo
- PubLayNet
- InvoiceNet Extract text from invoice
- Cutie
Research Papers
- LayoutLM
- PICK
- Deep Convolutional Nets for Document Image Classification and Retrieval
- Table Detection in Invoice Documents by Graph Neural Network
- Graph Convolution on Structured Documents
- PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
- Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
- Few-Shot Learning with Graph Neural Networks
- MMDetection: Open MMLab Detection Toolbox and Benchmark
- Efficient, Lexicon-Free OCR using Deep Learning
- An Overview of the Tesseract OCR Engine
- Semi-Supervised Classification with Graph Convolutional Networks
- An Invoice Reading System Using a Graph Convolutional Network
- Spatial Dependency Parsing for Semi-Structured Document Information Extraction
DETECT TABLE in Image/PDF
- CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
- TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images
- RetinaNet
Reference
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
Security Score
Audited on Aug 5, 2025
