Benchmarks
Benchmarking PDF libraries
Install / Use
/learn @py-pdf/BenchmarksREADME
PDF Library Benchmarks
This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.
Benchmarking machine
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Input Documents
| # | Name | File Size | Pages | | -: | :----------------------------------------------------------------------------------------------- | --------: | ----: | | 1 | 2201.00214 | 2.4MiB | 22 | | 2 | GeoTopo-book | 5.1MiB | 117 | | 3 | 2201.00151 | 1.5MiB | 12 | | 4 | 1707.09725 | 7.0MiB | 134 | | 5 | 2201.00021 | 2.6MiB | 10 | | 6 | 2201.00037 | 2.9MiB | 33 | | 7 | 2201.00069 | 14.7MiB | 15 | | 8 | 2201.00178 | 2.3MiB | 16 | | 9 | 2201.00201 | 1.3MiB | 9 | | 10 | 1602.06541 | 2.9MiB | 16 | | 11 | 2201.00200 | 284.8KiB | 7 | | 12 | 2201.00022 | 1.2MiB | 14 | | 13 | 2201.00029 | 797.6KiB | 12 | | 14 | 1601.03642 | 1004.9KiB | 8 |
Libraries
| Name | Last PyPI Release | License | Version | Dependencies | | -----------: | :---------------- | ------------------------------: | -------: | :-------------------------------------------------------- | | pypdfium2 | 2024-12-19 | Apache-2.0 or BSD-3-Clause | 4.30.1 | PDFium (Foxit/Google) | | pdfminer.six | 2025-05-06 | MIT/X | 20250506 | | | pdfplumber | 2025-06-12 | MIT | 0.11.7 | pdfminer.six | | pdfrw | 2017-09-18 | MIT | 0.4 | | | pdftotext | - | GPL | 0.86.1 | build-essential libpoppler-cpp-dev pkg-config python3-dev | | PyMuPDF | 2025-06-12 | GNU AFFERO GPL 3.0 / Commerical | 1.26.1 | MuPDF | | pypdf | 2025-06-29 | BSD 3-Clause | 5.7.0 | | | Tika | 2025-03-26 | Apache v2 | 3.1.0 | Apache Tika |
Text Extraction Speed
| # | Library | Average | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | | :- | :-------------------------------------------------------- | :------ | :---------------------------------------------- | :------------------------------------------------------------------------------------------ | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | :---------------------------------------------- | | 1 | PyMuPDF | 0.1s | 0.4s | 0.3s | 0.2s | 0.2s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.1s | 0.0s | 0.0s | | 2 | pypdfium2 | 0.1s | 0.5s | 0.3s | 0.2s | 0.2s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.1s | 0.0s | 0.0s | 0.0s | 0.0s | | 3 | Tika | 0.2s | 0.8s | 0.5s | 0.3s | 0.3s | 0.1s | 0.2s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.1s | 0.0s | 0.0s | | 4 | pdftotext | 0.3s | 0.7s | 0.9s | 0.2s | 0.8s | 0.1s | 0.3s | 0.4s | 0.1s | 0.1s | 0.2s | 0.1s | 0.1s | 0.0s | 0.0s | | 5 | pypdf | 3.5s | 26.2s | 6.4s | 6.8s | 3.3s | 0.9s | 1.6s | 0.6s | 0.6s | 0.5s | 0.8s | 0.6s | 0.6s | 0.5s | 0.3s | | 6 | pdfminer.six | 5.8s | 35.1s | 16.6s | 10.2s | 5.5s | 1.5s | 2.5s | 1.1s | 1.6s | 1.1s
