Pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
Install / Use
/learn @ashutoshvarma/PyxpdfREADME
pyxpdf
pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.
.. start-badges
.. list-table:: :stub-columns: 1
* - docs
- |docs|
* - tests
- |azure| |travis| |codecov|
* - package
- |pypi| |pythonver| |wheel| |downloads|
* - license
- |license|
.. end-badges
Features
- Almost x20 times faster than pure python based pdf parsers (see
Speed Comparison_) - Extract text while maintaining original document layout (best possible)
- Support almost all PDF encodings, CMaps and predefined CMaps.
- Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
- Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
- No explict dependencies (except optional ones, see
Installation_) - Thread Safe
More Information
-
Documentation <https://pyxpdf.readthedocs.io/>_Installation_Quickstart <https://pyxpdf.readthedocs.io/en/latest/intro.html#quick-start>_
-
Contribute <https://github.com/ashutoshvarma/pyxpdf/blob/master/.github/CONTRIBUTING.md>_Build <https://github.com/ashutoshvarma/pyxpdf/blob/master/BUILD.rst>_Issues <https://github.com/ashutoshvarma/pyxpdf/issues>_Pull requests <https://github.com/ashutoshvarma/pyxpdf/pulls>_
-
Speed Comparison_ -
Changelog <https://pyxpdf.readthedocs.io/en/latest/changelog.html>_
License
pyxpdf is licensed under the GNU General Public License (GPL),
version 2 or 3. See the LICENSE <https://github.com/ashutoshvarma/pyxpdf/blob/master/LICENSE>_
Credits
xpdf reader <https://www.xpdfreader.com/>_ by Derek Noonburglxml <https://www.github.com/lxml/lxml>_ - project structure and build adapted from lxmlpoppler <https://poppler.freedesktop.org/>_ project
.. _Speed Comparison: https://pyxpdf.readthedocs.io/en/latest/compare.html
.. _Installation: https://pyxpdf.readthedocs.io/en/latest/intro.html#installation
.. |azure| image:: https://img.shields.io/azure-devops/build/ashutoshvarma/pyxpdf/1/master?label=Azure%20Pipelines&style=for-the-badge
:alt: Azure DevOps builds (branch)
:target: https://ashutoshvarma.visualstudio.com/pyxpdf/_build
.. |travis| image:: https://img.shields.io/travis/com/ashutoshvarma/pyxpdf?label=Travis&style=for-the-badge
:alt: Travis (.com)
:target: https://travis-ci.com/github/ashutoshvarma/pyxpdf
.. |docs| image:: https://img.shields.io/readthedocs/pyxpdf?style=for-the-badge
:alt: Read the Docs
:target: https://pyxpdf.readthedocs.io/en/latest/
.. |codecov| image:: https://img.shields.io/codecov/c/github/ashutoshvarma/pyxpdf?style=for-the-badge
:alt: Codecov
:target: https://codecov.io/gh/ashutoshvarma/pyxpdf/
.. |license| image:: https://img.shields.io/github/license/ashutoshvarma/pyxpdf?style=for-the-badge
:alt: GitHub
:target: https://github.com/ashutoshvarma/pyxpdf/blob/master/LICENSE
.. |pypi| image:: https://img.shields.io/pypi/v/pyxpdf?color=light&style=for-the-badge
:alt: PyPI
:target: https://pypi.org/project/pyxpdf/
.. |pythonver| image:: https://img.shields.io/pypi/pyversions/pyxpdf?style=for-the-badge
:alt: PyPI - Python Version
:target: https://pypi.org/project/pyxpdf/
.. |wheel| image:: https://img.shields.io/pypi/wheel/pyxpdf?style=for-the-badge
:alt: PyPI - Wheel
:target: https://pypi.org/project/pyxpdf/
.. |downloads| image:: https://img.shields.io/pypi/dm/pyxpdf?label=PyPI%20Downloads&style=for-the-badge
:alt: PyPI - Downloads
:target: https://pypi.org/project/pyxpdf/
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
84.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
340.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
