SkillAgentSearch skills...

Thulium

Thulium is a production-ready Python library for offline handwritten text recognition (HTR) supporting 52+ languages across Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean, and Georgian scripts.

Install / Use

/learn @olaflaitinen/Thulium

README

Thulium

State-of-the-art multilingual handwriting text recognition.

PyPI Python License Documentation

Thulium is a production-ready Python library for offline handwritten text recognition (HTR) supporting 52+ languages across Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean, and Georgian scripts.

Features

  • 52+ Languages — Comprehensive multilingual support with script-aware processing
  • Production Ready — Optimized inference with ONNX export and mixed precision
  • State-of-the-Art — CNN/ViT backbones with Transformer/LSTM sequence heads
  • Explainable AI — Attention visualization, saliency maps, and confidence analysis
  • Flexible Decoding — CTC beam search with n-gram and neural language models

Installation

pip install thulium-htr

For GPU acceleration:

pip install thulium-htr[gpu]

Quick Start

from thulium import recognize_image

# Single image recognition
result = recognize_image("document.png", language="en")
print(result.text)

# Batch recognition with confidence scores
from thulium import HTRPipeline

pipeline = HTRPipeline.from_pretrained("thulium-base-multilingual")
results = pipeline.recognize_batch(images, languages=["en", "de", "fr"])

for r in results:
    print(f"{r.text} (confidence: {r.confidence:.2%})")

Supported Languages

<details> <summary>52+ languages across 10 scripts (click to expand)</summary>

| Region | Languages | |--------|-----------| | Western Europe | English, German, French, Spanish, Italian, Portuguese, Dutch | | Scandinavia | Swedish, Norwegian, Danish, Finnish, Icelandic | | Eastern Europe | Polish, Czech, Hungarian, Romanian, Bulgarian, Ukrainian, Russian | | Baltic | Lithuanian, Latvian, Estonian | | Caucasus | Georgian, Armenian, Azerbaijani | | Middle East | Arabic, Hebrew, Persian, Turkish | | South Asia | Hindi, Bengali, Tamil, Telugu, Urdu | | East Asia | Chinese, Japanese, Korean |

</details>

Documentation

| Guide | Description | |-------|-------------| | Getting Started | Installation and first steps | | API Reference | Complete API documentation | | Model Zoo | Pretrained model catalog | | Training Guide | Train custom models | | Architecture | System design overview |

Performance

Benchmarks on IAM Handwriting Database:

| Model | CER | WER | Latency | |-------|-----|-----|---------| | thulium-tiny | 5.2% | 14.1% | 12ms | | thulium-base | 3.8% | 10.2% | 28ms | | thulium-large | 2.9% | 7.8% | 65ms |

Measured on NVIDIA A100, batch size 1, PyTorch 2.0+

Citation

@software{thulium2025,
  title={Thulium: Multilingual Handwriting Recognition},
  author={Thulium Authors},
  year={2025},
  url={https://github.com/thulium-dev/thulium}
}

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

Apache 2.0 — see LICENSE for details.

View on GitHub
GitHub Stars8
CategoryCustomer
Updated3mo ago
Forks0

Languages

Python

Security Score

87/100

Audited on Dec 14, 2025

No findings