SkillAgentSearch skills...

DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

Install / Use

/learn @BobLd/DocumentLayoutAnalysis

README

Document Layout Analysis repos for development with PdfPig.

From wikipedia: Document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order. Detection and labeling of the different zones (or blocks) as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document (titles, captions, footnotes, etc.) and this kind of semantic labeling is the scope of the logical layout analysis.

Related projects

Cited by

Resources

Text extraction

Word segmentation

example

Page segmentation

Recursive XY Cut code PdfPig

The X-Y cut segmentation algorithm, also referred to as recursive X-Y cuts (RXYC) algorithm, is a tree-based top-down algorithm. The root of the tree represents the entire document page. All the leaf nodes together represent the final segmentation. The RXYC algorithm __recursively splits the document into two or more smaller rectangular blocks which represent the nodes of the tree. At each step of the recursion, the horizontal a

View on GitHub
GitHub Stars633
CategoryDevelopment
Updated10d ago
Forks69

Languages

C#

Security Score

85/100

Audited on Mar 17, 2026

No findings