Information extraction from Image using Deep learning

Research paper and code on information extraction from image/pdf

Layout Analysis in Document/Image:

Document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order.Detection and labeling of the different zones (or blocks) as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis.(https://en.wikipedia.org/wiki/Document_layout_analysis)

Alt text

Tools:

INVOICE EXTRACTION

RECEIPT DATA

CORD: A Consolidated Receipt Dataset for Post-OCR Parsing

Datasets

Code:

Layoutlm
Graph Convolution on Structured Documents
Graph Matric
Feature Extraction from Graph
Extract data from Invoice
CascadeTabNet
Tabulo
PubLayNet
InvoiceNet Extract text from invoice
Cutie

Research Papers

DETECT TABLE in Image/PDF

Reference

parsing pdf

DocumentUnderstanding

Install / Use

README