Dots.ocr
Multilingual Document Layout Parsing in a Single Vision-Language Model
Install / Use
/learn @rednote-hilab/Dots.ocrREADME
<div align="center">
<p align="center">
<img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/>
<p>
<h1 align="center">
dots.ocr
</h1>
<div align="center">
<a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
<a href="assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
<a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
</div>
</div>
Introduction
dots.ocr Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text.
News
2026.03.19We have rebrandeddots.ocr-1.5as dots.mocr. For technical details, please refer to our paper. The model weights are available on Hugging Face: dots.mocr and dots.mocr-svg.2025.10.31🚀 We release dots.ocr.base, foundation VLM focus on OCR tasks, also the base model of dots.ocr. Try it out!2025.07.30🚀 We release dots.ocr, — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
Evaluation
1. Document Parsing
1.1 Elo Score of different bench between latest models
<table> <thead> <tr> <th>models</th> <th>olmOCR-Bench</th> <th>OmniDocBench (v1.5)</th> <th>XDocParse</th> <th>Average</th> </tr> </thead> <tbody> <tr> <td>MonkeyOCR-pro-3B</td> <td>895.0</td> <td>811.3</td> <td>637.1</td> <td>781.1</td> </tr> <tr> <td>GLM-OCR</td> <td>884.2</td> <td>972.6</td> <td>820.7</td> <td>892.5</td> </tr> <tr> <td>PaddleOCR-VL-1.5</td> <td>897.3</td> <td>997.9</td> <td>866.4</td> <td>920.5</td> </tr> <tr> <td>HuanyuanOCR</td> <td>997.6</td> <td>1003.9</td> <td>951.1</td> <td>984.2</td> </tr> <tr> <td>dots.ocr</td> <td>1041.1</td> <td>1027.2</td> <td>1190.3</td> <td>1086.2</td> </tr> <!-- Highlighting dots.mocr row with bold tags --> <tr> <td><strong>dots.mocr</strong></td> <td><strong>1104.4</strong></td> <td><strong>1059.0</strong></td> <td><strong>1210.7</strong></td> <td><strong>1124.7</strong></td> </tr> <tr> <td>Gemini 3 Pro</td> <td>1180.4</td> <td>1128.0</td> <td>1323.7</td> <td>1210.7</td> </tr> </tbody> </table>Notes:
- Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference.
- The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: Elo Score Prompt. These results are consistent with the findings on ocrarena.
1.2 olmOCR-bench
<table> <thead> <tr> <th>Model</th> <th>ArXiv</th> <th>Old scans math</th> <th>Tables</th> <th>Old scans</th> <th>Headers & footers</th> <th>Multi column</th> <th>Long tiny text</th> <th>Base</th> <th>Overall</th> </tr> </thead> <tbody> <tr> <td>Mistral OCR API</td> <td>77.2</td> <td>67.5</td> <td>60.6</td> <td>29.3</td> <td>93.6</td> <td>71.3</td> <td>77.1</td> <td>99.4</td> <td>72.0±1.1</td> </tr> <tr> <td>Marker 1.10.1</td> <td>83.8</td> <td>66.8</td> <td>72.9</td> <td>33.5</td> <td>86.6</td> <td>80.0</td> <td>85.7</td> <td>99.3</td> <td>76.1±1.1</td> </tr> <tr> <td>MinerU 2.5.4*</td> <td>76.6</td> <td>54.6</td> <td>84.9</td> <td>33.7</td> <td>96.6</td> <td>78.2</td> <td>83.5</td> <td>93.7</td> <td>75.2±1.1</td> </tr> <tr> <td>DeepSeek-OCR</td> <td>77.2</td> <td>73.6</td> <td>80.2</td> <td>33.3</td> <td>96.1</td> <td>66.4</td> <td>79.4</td> <td>99.8</td> <td>75.7±1.0</td> </tr> <tr> <td>Nanonets-OCR2-3B</td> <td>75.4</td> <td>46.1</td> <td>86.8</td> <td>40.9</td> <td>32.1</td> <td>81.9</td> <td>93.0</td> <td>99.6</td> <td>69.5±1.1</td> </tr> <tr> <td>PaddleOCR-VL*</td> <td>85.7</td> <td>71.0</td> <td>84.1</td> <td>37.8</td> <td>97.0</td> <td>79.9</td> <td>85.7</td> <td>98.5</td> <td>80.0±1.0</td> </tr> <tr> <td>Infinity-Parser 7B*</td> <td>84.4</td> <td>83.8</td> <td>85.0</td> <td>47.9</td> <td>88.7</td> <td>84.2</td> <td>86.4</td> <td>99.8</td> <td>82.5±?</td> </tr> <tr> <td>olmOCR v0.4.0</td> <td>83.0</td> <td>82.3</td> <td>84.9</td> <td>47.7</td> <td>96.1</td> <td>83.7</td> <td>81.9</td> <td>99.7</td> <td>82.4±1.1</td> </tr> <tr> <td>Chandra OCR 0.1.0*</td> <td>82.2</td> <td>80.3</td> <td>88.0</td> <td>50.4</td> <td>90.8</td> <td>81.2</td> <td>92.3</td> <td>99.9</td> <td>83.1±0.9</td> </tr> <tr> <td>dots.ocr</td> <td>82.1</td> <td>64.2</td> <td>88.3</td> <td>40.9</td> <td>94.1</td> <td>82.4</td> <td>81.2</td> <td>99.5</td> <td>79.1±1.0</td> </tr> <tr> <td><strong>dots.mocr</strong></td> <td><strong>85.9</strong></td> <td><strong>85.5</strong></td> <td><strong>90.7</strong></td> <td>48.2</td> <td>94.0</td> <td><strong>85.3</strong></td> <td>81.6</td> <td>99.7</td> <td><strong>83.9±0.9</strong></td> </tr> </tbody> </table>Note:
- The metrics are from olmocr, and our own internal evaluations.
- We delete the Page-header and Page-footer cells in the result markdown.
