Doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Generate Convert Improve

Install / Use

/learn @mindee/Doctr

About this skill

Quality Score

0/100

README

Optical Character Recognition made seamless & accessible to anyone, powered by PyTorch

What you can expect from this repository:

efficient ways to parse textual information (localize and identify each word) from your documents
guidance on how to integrate this in your current architecture

OCR_example

Quick Tour

Getting your pretrained model

End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.

from doctr.models import ocr_predictor

model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

Reading files

Documents can be interpreted from PDF or images:

from doctr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage (requires `weasyprint` to be installed)
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])

Putting it together

Let's use the default pretrained model for an example:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)

Dealing with rotated documents

Should you use docTR on documents that include rotated pages, or pages with multiple box orientations, you have multiple options to handle it:

If you only use straight document pages with straight words (horizontal, same reading direction), consider passing assume_straight_pages=True to the ocr_predictor. It will directly fit straight boxes on your page and return straight boxes, which makes it the fastest option.
If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations will be converted to straight boxes), you need to pass export_as_straight_boxes=True in the predictor. Otherwise, if assume_straight_pages=False, it will return rotated bounding boxes (potentially with an angle of 0°).

If both options are set to False, the predictor will always fit and return rotated boxes.

To interpret your model's predictions, you can visualize them interactively as follows:

# Display the result (requires matplotlib & mplcursors to be installed)
result.show()

Visualization sample

Or even rebuild the original document from its predictions:

import matplotlib.pyplot as plt

synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

Synthesis sample

The ocr_predictor returns a Document object with a nested structure (with Page, Block, Line, Word, Artefact). To get a better understanding of our document model, check our documentation:

You can also export them as a nested dict, more appropriate for JSON format:

json_output = result.export()

Use the KIE predictor

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.

from doctr.io import DocumentFile
from doctr.models import kie_predictor

# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)

predictions = result.pages[0].predictions
for class_name in predictions.keys():
    list_predictions = predictions[class_name]
    for prediction in list_predictions:
        print(f"Prediction for {class_name}: {prediction}")

The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

If you are looking for support from the Mindee team

Installation

Prerequisites

Python 3.10 (or higher) and pip are required to install docTR.

Latest release

You can then install the latest release of the package using pypi as follows:

pip install python-doctr

We try to keep extra dependencies to a minimum. You can install specific builds as follows:

# standard build
pip install python-doctr
# optional dependencies for visualization, html, and contrib modules can be installed as follows:
pip install "python-doctr[viz,html,contrib]"

Developer mode

Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Again, if you prefer to avoid the risk of missing dependencies, you can install the build:

pip install -e doctr/.

Models architectures

Credits where it's due: this repository is implementing, among others, architectures from published research papers.

Text Detection

Text Recognition

More goodies

Documentation

The full package documentation is available here for detailed specifications.

Demo app

A minimal demo app is provided for you to play with our end-to-end OCR models!

Demo app

Live demo

Courtesy of :hugs: Hugging Face :hugs:, docTR has now a fully deployed version available on Spaces! Check it out

Running it locally

If you prefer to use it locally, there is an extra dependency (Streamlit) that is required.

pip install -r demo/pt-requirements.txt

Then run your app in your def

Related Skills

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

last30days-skill

17.2k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

mindee

View profile

View on GitHub

GitHub Stars6.0k

CategoryEducation

Updated3h ago

Forks633

mindee/doctr

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings