Pix2Text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

Generate Convert Improve

Install / Use

/learn @breezedeus/Pix2Text

About this skill

Quality Score

0/100

README

last-release last-commit

📖 Doc | 👩🏻‍💻 Online Service | 👨🏻‍💻 Demo | 💬 Contact

</div> <div align="center">

中文 | English

</div>

Pix2Text

Update 2025.07.25: V1.1.4 Released

Major Changes:

Upgraded the Mathematical Formula Detection (MFD) and Mathematical Formula Recognition (MFR) models to version 1.5. All default configurations, documentation, and examples now use mfd-1.5 and mfr-1.5 as the standard models.

Update 2025.04.15: V1.1.3 Released

Major Changes:

Support for VlmTableOCR and VlmTextFormulaOCR models based on the VLM interface (see LiteLLM documentation) allowing the use of closed-source VLM models. Installation command: pip install pix2text[vlm].
- Usage examples can be found in tests/test_vlm.py and tests/test_pix2text.py.

Update 2024.11.17: V1.1.2 Released

Major Changes:

A new layout analysis model DocLayout-YOLO has been integrated, improving the accuracy of layout analysis.

Update 2024.06.18：V1.1.1 Released

Major changes:

Support the new mathematical formula detection models (MFD): breezedeus/pix2text-mfd (Mirror), which significantly improves the accuracy of formula detection.

See details: Pix2Text V1.1.1 Released, Bringing Better Mathematical Formula Detection Models | Breezedeus.com.

Update 2024.04.28: V1.1 Released

Major changes:

Added layout analysis and table recognition models, supporting the conversion of images with complex layouts into Markdown format. See examples: Pix2Text Online Documentation / Examples.
Added support for converting entire PDF files to Markdown format. See examples: Pix2Text Online Documentation / Examples.
Enhanced the interface with more features, including adjustments to existing interface parameters.
Launched the Pix2Text Online Documentation.

Update 2024.02.26: V1.0 Released

Main Changes:

The Mathematical Formula Recognition (MFR) model employs a new architecture and has been trained on a new dataset, achieving state-of-the-art (SOTA) accuracy. For detailed information, please see: Pix2Text V1.0 New Release: The Best Open-Source Formula Recognition Model | Breezedeus.com.

See more at: RELEASE.md .

<br/>

Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, mathematical formulas, and integrate all of these contents into Markdown format. P2T can also convert an entire PDF file (which can contain scanned images or any other format) into Markdown format.

Pix2Text (P2T) integrates the following models:

Layout Analysis Model: breezedeus/pix2text-layout (Mirror).
Table Recognition Model: breezedeus/pix2text-table-rec (Mirror).
Text Recognition Engine: Supports 80+ languages such as English, Simplified Chinese, Traditional Chinese, Vietnamese, etc. For English and Simplified Chinese recognition, it uses the open-source OCR tool CnOCR, while for other languages, it uses the open-source OCR tool EasyOCR.
Mathematical Formula Detection Model (MFD): breezedeus/pix2text-mfd-1.5 (Mirror). Implemented based on CnSTD.
Mathematical Formula Recognition Model (MFR): breezedeus/pix2text-mfr-1.5 (Mirror).

Several models are contributed by other open-source authors, and their contributions are highly appreciated.

For detailed explanations, please refer to the Pix2Text Online Documentation/Models.

<br/>

As a Python3 toolkit, P2T may not be very user-friendly for those who are not familiar with Python. Therefore, we also provide a free-to-use P2T Online Web, where you can directly upload images and get P2T parsing results. The web version uses the latest models, resulting in better performance compared to the open-source models.

If you're interested, feel free to add the assistant as a friend by scanning the QR code and mentioning p2t. The assistant will regularly invite everyone to join the group where the latest developments related to P2T tools will be announced:

The author also maintains a Knowledge Planet P2T/CnOCR/CnSTD Private Group, where questions are answered promptly. You're welcome to join. The knowledge planet private group will also gradually release some private materials related to P2T/CnOCR/CnSTD, including some unreleased models, discounts on purchasing premium models, code snippets for different application scenarios, and answers to difficult problems encountered during use. The planet will also publish the latest research materials related to P2T/OCR/STD.

For more contact method, please refer to Contact.

List of Supported Languages

The text recognition engine of Pix2Text supports 80+ languages, including English, Simplified Chinese, Traditional Chinese, Vietnamese, etc. Among these, English and Simplified Chinese recognition utilize the open-source OCR tool CnOCR, while recognition for other languages employs the open-source OCR tool EasyOCR. Special thanks to the respective authors.

List of Supported Languages and Language Codes are shown below:

<details> <summary>↓↓↓ Click to show details ↓↓↓</summary>

| Language | Code Name | | ------------------- | ----------- | | Abaza | abq | | Adyghe | ady | | Afrikaans | af | | Angika | ang | | Arabic | ar | | Assamese | as | | Avar | ava | | Azerbaijani | az | | Belarusian | be | | Bulgarian | bg | | Bihari | bh | | Bhojpuri | bho | | Bengali | bn | | Bosnian | bs | | Simplified Chinese | ch_sim | | Traditional Chinese | ch_tra | | Chechen | che | | Czech | cs | | Welsh | cy | | Danish | da | | Dargwa | dar | | German | de | | English | en | | Spanish | es | | Estonian | et | | Persian (Farsi) | fa | | French | fr | | Irish | ga | | Goan Konkani | gom | | Hindi | hi | | Croatian | hr | | Hungarian | hu | | Indonesian | id | | Ingush | inh | | Icelandic | is | | Italian | it | | Japanese | ja | | Kabardian | kbd | | Kannada | kn | | Korean | ko | | Kurdish | ku | | Latin | la | | Lak | lb

Related Skills

claude-opus-4-5-migration

83.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

model-usage

338.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

openhue

338.7k

Control Philips Hue lights and scenes via the OpenHue CLI.

sag

338.7k

ElevenLabs text-to-speech with mac-style say UX.