SkillAgentSearch skills...

Donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Install / Use

/learn @clovaai/Donut
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center">

Donut 🍩 : Document Understanding Transformer

Paper Conference Demo Demo PyPI Downloads

Official Implementation of Donut and SynthDoG | Paper | Slide | Poster

</div>

Introduction

Donut 🍩, Document understanding transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing). In addition, we present SynthDoG 🐶, Synthetic Document Generator, that helps the model pre-training to be flexible on various languages and domains.

Our academic paper, which describes our method in detail and provides full experimental results and analyses, can be found here:<br>

OCR-free Document Understanding Transformer.<br> Geewook Kim, Teakgyu Hong, Moonbin Yim, JeongYeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. In ECCV 2022.

<img width="946" alt="image" src="misc/overview.png">

Pre-trained Models and Web Demos

Gradio web demos are available! Demo Demo |:--:| |image|

  • You can run the demo with ./app.py file.
  • Sample images are available at ./misc and more receipt images are available at CORD dataset link.
  • Web demos are available from the links in the following table.
  • Note: We have updated the Google Colab demo (as of June 15, 2023) to ensure its proper working.

|Task|Sec/Img|Score|Trained Model|<div id="demo">Demo</div>| |---|---|---|---|---| | CORD (Document Parsing) | 0.7 /<br> 0.7 /<br> 1.2 | 91.3 /<br> 91.1 /<br> 90.9 | donut-base-finetuned-cord-v2 (1280) /<br> donut-base-finetuned-cord-v1 (1280) /<br> donut-base-finetuned-cord-v1-2560 | gradio space web demo,<br>google colab demo (updated at 23.06.15) | | Train Ticket (Document Parsing) | 0.6 | 98.7 | donut-base-finetuned-zhtrainticket | google colab demo (updated at 23.06.15) | | RVL-CDIP (Document Classification) | 0.75 | 95.3 | donut-base-finetuned-rvlcdip | gradio space web demo,<br>google colab demo (updated at 23.06.15) | | DocVQA Task1 (Document VQA) | 0.78 | 67.5 | donut-base-finetuned-docvqa | gradio space web demo,<br>google colab demo (updated at 23.06.15) |

The links to the pre-trained backbones are here:

  • donut-base: trained with 64 A100 GPUs (~2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (English, Chinese, Japanese, Korean, 0.5M x 4).
  • donut-proto: (preliminary model) trained with 8 V100 GPUs (~5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (English, Japanese, Korean, 0.4M x 3).

Please see our paper for more details.

SynthDoG datasets

image

The links to the SynthDoG-generated datasets are here:

To generate synthetic datasets with our SynthDoG, please see ./synthdog/README.md and our paper for details.

Updates

2023-06-15 We have updated all Google Colab demos to ensure its proper working.<br> 2022-11-14 New version 1.0.9 is released (pip install donut-python --upgrade). See 1.0.9 Release Notes.<br> 2022-08-12 Donut 🍩 is also available at huggingface/transformers 🤗 (contributed by @NielsRogge). donut-python loads the pre-trained weights from the official branch of the model repositories. See 1.0.5 Release Notes.<br> 2022-08-05 A well-executed hands-on tutorial on donut 🍩 is published at Towards Data Science (written by @estaudere).<br> 2022-07-20 First Commit, We release our code, model weights, synthetic data and generator.

Software installation

PyPI Downloads

pip install donut-python

or clone this repository and install the dependencies:

git clone https://github.com/clovaai/donut.git
cd donut/
conda create -n donut_official python=3.7
conda activate donut_official
pip install .

We tested donut-python == 1.0.1 with:

Note: From several reported issues, we have noticed increased challenges in configuring the testing environment for donut-python due to recent updates in key dependency libraries. While we are actively working on a solution, we have updated the Google Colab demo (as of June 15, 2023) to ensure its proper working. For assistance, we encourage you to refer to the following demo links: CORD Colab Demo, Train Ticket Colab Demo, RVL-CDIP Colab Demo, DocVQA Colab Demo.

Getting Started

Data

This repository assumes the following structure of dataset:

> tree dataset_name
dataset_name
├── test
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
│             .
│             .
├── train
│   ├── metadata.jsonl
│   ├── {image_path0}
│   ├── {image_path1}
│             .
│             .
└── validation
    ├── metadata.jsonl
    ├── {image_path0}
    ├── {image_path1}
              .
              .

> cat dataset_name/test/metadata.jsonl
{"file_name": {image_path0}, "ground_truth": "{\"gt_parse\": {ground_truth_parse}, ... {other_metadata_not_used} ... }"}
{"file_name": {image_path1}, "ground_truth": "{\"gt_parse\": {groun

Related Skills

View on GitHub
GitHub Stars6.8k
CategoryDevelopment
Updated5h ago
Forks555

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings