SkillAgentSearch skills...

Pdf2Dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox. Pdf2Dom is based on the Apache PDFBox™ library.

Install / Use

/learn @radkovo/Pdf2Dom
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

Pdf2Dom

Build Status

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. The inline CSS definitions contained in the resulting document are used for making the HTML page as similar as possible to the PDF input. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications.

Pdf2Dom is based on the Apache PDFBox™ library.

See the project page for more information and downloads: http://cssbox.sourceforge.net/pdf2dom

See also the Pdf2Dom-lite fork that provides a lightweight version of Pdf2Dom with no font decoding support but significantly reduced dependencies.

Related Skills

View on GitHub
GitHub Stars193
CategoryDevelopment
Updated9d ago
Forks75

Languages

Java

Security Score

95/100

Audited on Mar 14, 2026

No findings