Doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Install / Use
/learn @mindee/DoctrREADME
Optical Character Recognition made seamless & accessible to anyone, powered by PyTorch
What you can expect from this repository:
- efficient ways to parse textual information (localize and identify each word) from your documents
- guidance on how to integrate this in your current architecture

Quick Tour
Getting your pretrained model
End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.
from doctr.models import ocr_predictor
model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
Reading files
Documents can be interpreted from PDF or images:
from doctr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage (requires `weasyprint` to be installed)
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
Putting it together
Let's use the default pretrained model for an example:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)
Dealing with rotated documents
Should you use docTR on documents that include rotated pages, or pages with multiple box orientations, you have multiple options to handle it:
-
If you only use straight document pages with straight words (horizontal, same reading direction), consider passing
assume_straight_pages=Trueto the ocr_predictor. It will directly fit straight boxes on your page and return straight boxes, which makes it the fastest option. -
If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations will be converted to straight boxes), you need to pass
export_as_straight_boxes=Truein the predictor. Otherwise, ifassume_straight_pages=False, it will return rotated bounding boxes (potentially with an angle of 0°).
If both options are set to False, the predictor will always fit and return rotated boxes.
To interpret your model's predictions, you can visualize them interactively as follows:
# Display the result (requires matplotlib & mplcursors to be installed)
result.show()

Or even rebuild the original document from its predictions:
import matplotlib.pyplot as plt
synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

The ocr_predictor returns a Document object with a nested structure (with Page, Block, Line, Word, Artefact).
To get a better understanding of our document model, check our documentation:
You can also export them as a nested dict, more appropriate for JSON format:
json_output = result.export()
Use the KIE predictor
The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document.
The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.
from doctr.io import DocumentFile
from doctr.models import kie_predictor
# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)
predictions = result.pages[0].predictions
for class_name in predictions.keys():
list_predictions = predictions[class_name]
for prediction in list_predictions:
print(f"Prediction for {class_name}: {prediction}")
The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.
If you are looking for support from the Mindee team
Installation
Prerequisites
Python 3.10 (or higher) and pip are required to install docTR.
Latest release
You can then install the latest release of the package using pypi as follows:
pip install python-doctr
We try to keep extra dependencies to a minimum. You can install specific builds as follows:
# standard build
pip install python-doctr
# optional dependencies for visualization, html, and contrib modules can be installed as follows:
pip install "python-doctr[viz,html,contrib]"
Developer mode
Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:
git clone https://github.com/mindee/doctr.git
pip install -e doctr/.
Again, if you prefer to avoid the risk of missing dependencies, you can install the build:
pip install -e doctr/.
Models architectures
Credits where it's due: this repository is implementing, among others, architectures from published research papers.
Text Detection
- DBNet: Real-time Scene Text Detection with Differentiable Binarization.
- LinkNet: LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation
- FAST: FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
Text Recognition
- CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.
- SAR: Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition.
- MASTER: MASTER: Multi-Aspect Non-local Network for Scene Text Recognition.
- ViTSTR: Vision Transformer for Fast and Efficient Scene Text Recognition.
- PARSeq: Scene Text Recognition with Permuted Autoregressive Sequence Models.
- VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition.
More goodies
Documentation
The full package documentation is available here for detailed specifications.
Demo app
A minimal demo app is provided for you to play with our end-to-end OCR models!

Live demo
Courtesy of :hugs: Hugging Face :hugs:, docTR has now a fully deployed version available on Spaces!
Check it out
Running it locally
If you prefer to use it locally, there is an extra dependency (Streamlit) that is required.
pip install -r demo/pt-requirements.txt
Then run your app in your def
Related Skills
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
last30days-skill
17.2kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

