SkillAgentSearch skills...

Dots.ocr

Multilingual Document Layout Parsing in a Single Vision-Language Model

Install / Use

/learn @rednote-hilab/Dots.ocr
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <p align="center"> <img src="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/logo.png" width="300"/> <p> <h1 align="center"> dots.ocr </h1>

HuggingFace Arxiv

<div align="center"> <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> | <a href="assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> | <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> | <a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a> </div> </div>

Introduction

dots.ocr Designed for universal accessibility, it possesses the capability to recognize virtually any human script. Beyond achieving state-of-the-art (SOTA) performance in standard multilingual document parsing among models of comparable size, dots.ocr-1.5 excels at converting structured graphics (e.g., charts and diagrams) directly into SVG code, parsing web screens and spotting scene text.

News

  • 2026.03.19 We have rebranded dots.ocr-1.5 as dots.mocr. For technical details, please refer to our paper. The model weights are available on Hugging Face: dots.mocr and dots.mocr-svg.
  • 2025.10.31 🚀 We release dots.ocr.base, foundation VLM focus on OCR tasks, also the base model of dots.ocr. Try it out!
  • 2025.07.30 🚀 We release dots.ocr, — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.

Evaluation

1. Document Parsing

1.1 Elo Score of different bench between latest models

<table> <thead> <tr> <th>models</th> <th>olmOCR-Bench</th> <th>OmniDocBench (v1.5)</th> <th>XDocParse</th> <th>Average</th> </tr> </thead> <tbody> <tr> <td>MonkeyOCR-pro-3B</td> <td>895.0</td> <td>811.3</td> <td>637.1</td> <td>781.1</td> </tr> <tr> <td>GLM-OCR</td> <td>884.2</td> <td>972.6</td> <td>820.7</td> <td>892.5</td> </tr> <tr> <td>PaddleOCR-VL-1.5</td> <td>897.3</td> <td>997.9</td> <td>866.4</td> <td>920.5</td> </tr> <tr> <td>HuanyuanOCR</td> <td>997.6</td> <td>1003.9</td> <td>951.1</td> <td>984.2</td> </tr> <tr> <td>dots.ocr</td> <td>1041.1</td> <td>1027.2</td> <td>1190.3</td> <td>1086.2</td> </tr> <!-- Highlighting dots.mocr row with bold tags --> <tr> <td><strong>dots.mocr</strong></td> <td><strong>1104.4</strong></td> <td><strong>1059.0</strong></td> <td><strong>1210.7</strong></td> <td><strong>1124.7</strong></td> </tr> <tr> <td>Gemini 3 Pro</td> <td>1180.4</td> <td>1128.0</td> <td>1323.7</td> <td>1210.7</td> </tr> </tbody> </table>

Notes:

  • Results for Gemini 3 Pro, PaddleOCR-VL-1.5, and GLM-OCR were obtained via APIs, while HuanyuanOCR results were generated using local inference.
  • The Elo score evaluation was conducted using Gemini 3 Flash. The prompt can be found at: Elo Score Prompt. These results are consistent with the findings on ocrarena.

1.2 olmOCR-bench

<table> <thead> <tr> <th>Model</th> <th>ArXiv</th> <th>Old scans math</th> <th>Tables</th> <th>Old scans</th> <th>Headers & footers</th> <th>Multi column</th> <th>Long tiny text</th> <th>Base</th> <th>Overall</th> </tr> </thead> <tbody> <tr> <td>Mistral OCR API</td> <td>77.2</td> <td>67.5</td> <td>60.6</td> <td>29.3</td> <td>93.6</td> <td>71.3</td> <td>77.1</td> <td>99.4</td> <td>72.0±1.1</td> </tr> <tr> <td>Marker 1.10.1</td> <td>83.8</td> <td>66.8</td> <td>72.9</td> <td>33.5</td> <td>86.6</td> <td>80.0</td> <td>85.7</td> <td>99.3</td> <td>76.1±1.1</td> </tr> <tr> <td>MinerU 2.5.4*</td> <td>76.6</td> <td>54.6</td> <td>84.9</td> <td>33.7</td> <td>96.6</td> <td>78.2</td> <td>83.5</td> <td>93.7</td> <td>75.2±1.1</td> </tr> <tr> <td>DeepSeek-OCR</td> <td>77.2</td> <td>73.6</td> <td>80.2</td> <td>33.3</td> <td>96.1</td> <td>66.4</td> <td>79.4</td> <td>99.8</td> <td>75.7±1.0</td> </tr> <tr> <td>Nanonets-OCR2-3B</td> <td>75.4</td> <td>46.1</td> <td>86.8</td> <td>40.9</td> <td>32.1</td> <td>81.9</td> <td>93.0</td> <td>99.6</td> <td>69.5±1.1</td> </tr> <tr> <td>PaddleOCR-VL*</td> <td>85.7</td> <td>71.0</td> <td>84.1</td> <td>37.8</td> <td>97.0</td> <td>79.9</td> <td>85.7</td> <td>98.5</td> <td>80.0±1.0</td> </tr> <tr> <td>Infinity-Parser 7B*</td> <td>84.4</td> <td>83.8</td> <td>85.0</td> <td>47.9</td> <td>88.7</td> <td>84.2</td> <td>86.4</td> <td>99.8</td> <td>82.5±?</td> </tr> <tr> <td>olmOCR v0.4.0</td> <td>83.0</td> <td>82.3</td> <td>84.9</td> <td>47.7</td> <td>96.1</td> <td>83.7</td> <td>81.9</td> <td>99.7</td> <td>82.4±1.1</td> </tr> <tr> <td>Chandra OCR 0.1.0*</td> <td>82.2</td> <td>80.3</td> <td>88.0</td> <td>50.4</td> <td>90.8</td> <td>81.2</td> <td>92.3</td> <td>99.9</td> <td>83.1±0.9</td> </tr> <tr> <td>dots.ocr</td> <td>82.1</td> <td>64.2</td> <td>88.3</td> <td>40.9</td> <td>94.1</td> <td>82.4</td> <td>81.2</td> <td>99.5</td> <td>79.1±1.0</td> </tr> <tr> <td><strong>dots.mocr</strong></td> <td><strong>85.9</strong></td> <td><strong>85.5</strong></td> <td><strong>90.7</strong></td> <td>48.2</td> <td>94.0</td> <td><strong>85.3</strong></td> <td>81.6</td> <td>99.7</td> <td><strong>83.9±0.9</strong></td> </tr> </tbody> </table>

Note:

  • The metrics are from olmocr, and our own internal evaluations.
  • We delete the Page-header and Page-footer cells in the result markdown.

1.3 Other Benchmarks

<table> <thead> <tr> <th>Model Type</th> <th>Methods</th> <th>Size</th> <th>OmniDocBench(v1.5)<br>TextEdit↓</th> <th>OmniDocBench(v1.5)<br>Read OrderEdit↓</th> <th>pdf-parse-bench</th> </tr> </thead> <tbody> <!-- GeneralVLMs Group (Reversed Order, 3 rows) --> <tr> <td rowspan="3"><strong>GeneralVLMs</strong></td> <td>Gemini-2.5 Pro</td> <td>-</td> <td>0.075</td> <td>0.097</td> <td>9.06</td> </tr> <tr> <td>Qwen3-VL-235B-A22B-Instruct</td> <td>235B</td> <td>0.069</td> <td>0.068</td> <td><strong>9.71</strong></td> </tr> <tr> <td>gemini3pro</td> <td>-</td> <td>0.066</td> <td>0.079</td> <td>9.68</td> </tr> <!-- SpecializedVLMs Group (Reversed Order, 12 rows) --> <tr> <td rowspan="12"><strong>SpecializedVLMs</strong></td> <td>Mistral OCR</td> <td>-</td> <td>0.164</td> <td>0.144</td> <td>8.84</td> </tr> <tr> <td>Deepseek-OCR</td> <td>3B</td> <td>0.073</td> <td>0.086</td> <td>8.26</td> </tr> <tr> <td>MonkeyOCR-3B</td> <td>3B</td> <td>0.075</td> <td>0.129</td> <td>9.27</td> </tr> <tr> <td>OCRVerse</td> <td>4B</td> <td>0.058</td> <td>0.071</td> <td>--</td> </tr> <tr> <td>MonkeyOCR-pro-3B</td> <td>3B</td> <td>0.075</td> <td>0.128</td> <td>-</td> </tr> <tr> <td>MinerU2.5</td> <td>1.2B</td> <td>0.047</td> <td>0.044</td> <td>-</td> </tr> <tr> <td>PaddleOCR-VL</td> <td>0.9B</td> <td>0.035</td> <td>0.043</td> <td>9.51</td> </tr> <tr> <td>HunyuanOCR</td> <td>0.9B</td> <td>0.042</td> <td>-</td> <td>-</td> </tr> <tr> <td>PaddleOCR-VL1.5</td> <td>0.9B</td>
View on GitHub
GitHub Stars8.1k
CategoryDevelopment
Updated2h ago
Forks727

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings