Pdfreader
Python API for PDF documents
Install / Use
/learn @maxpmaxp/PdfreaderREADME
========= pdfreader
:Info: See the tutorials & documentation <https://pdfreader.readthedocs.io>_ for more information.
:Author & Maintainer: Maksym Polshcha maxp@sterch.net
See GitHub <https://github.com/maxpmaxp/pdfreader>_ for the latest source.
About
pdfreader is a Pythonic API for: * extracting texts, images and other data from PDF documents (plain or protected) * accessing different objects within PDF documents
pdfreader is NOT a tool (maybe one day it become!): * to create or update PDF files * to split PDF files into pages or other pieces * convert PDFs to any other format
Nevertheless it can be used as a part of such tools.
See Tutorials & Documentation <https://pdfreader.readthedocs.io>_.
Features
- Extracts texts (plain text and formatted text objects)
- Extract PDF forms data (pure strings and formatted text objects)
- Supports all PDF encodings, CMap, predefined cmaps.
- Extracts images and image masks as
Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>_ - Supports encrypted and password-protected PDF documents
- Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
- Follows
PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>_ - Lazy objects access allows to process huge PDF documents quite fast
Installation
pdfreader can be installed with pip <http://pypi.python.org/pypi/pip>_::
$ python -m pip install pdfreader
Or easy_install from
setuptools <http://pypi.python.org/pypi/setuptools>_::
$ python -m easy_install pdfreader
You can also download the project source and do::
$ python setup.py install
Tutorial and Documentation
Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>_
Support, Bugs & Feature Requests
pdfreader uses GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>_ to keep track of bugs,
feature requests, etc.
Related Projects
pdfminer <https://github.com/euske/pdfminer>_pyPdf2 <https://github.com/py-pdf/PyPDF2>_xpdf <http://www.foolabs.com/xpdf/>_pdfbox <http://pdfbox.apache.org/>_mupdf <http://mupdf.com/>_
References
Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>_Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>_PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>_Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>_Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>_
Donation
If this project is helpful, you can treat me to coffee :-)
.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support¤cy_code=USD&source=url
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
