KhmerOCR
A Fast Khmer Optical Character Recognition (KhmerOCR)
Install / Use
/learn @seanghay/KhmerOCRREADME
KhmerOCR
A high-performance Khmer Optical Character Recognition (OCR) engine tailored for documents. This model was trained on 3 million text lines using over 800+ Khmer fonts to ensure robust recognition across various styles and weights.
[!IMPORTANT] Update: The library now supports full document processing, layout detection, and multi-format exports (PDF, DOCX, HTML, Markdown).
Features
- Fast: Optimized for Khmer script using ONNX Runtime for fast inference
- Native C++ Engine: High-performance C/C++ implementation with C API for FFI bindings
- Font Detection: Automatically identifies and preserves Moul vs. Regular font styles
- Multi-format Export: Convert images or PDFs into editable
.docx,.md,.html, or.txtfiles - PDF Support: High-resolution PDF rendering and processing via PyMuPDF
- Cross-Platform: Supports macOS, Linux, Windows, iOS, and Android
Installation
Python
pip install git+https://github.com/seanghay/KhmerOCR
C++ Library
See cpp/README.md for build instructions.
cd cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
Usage
Python CLI
For single images or documents, run:
khmerocr document.jpg --format docx
Python CLI Options
| Option | Shortcut | Description | Default |
| ---------- | -------- | ------------------------------------------ | ------------------------- |
| --output | -o | Custom output path | input_filename.{format} |
| --format | -f | Output format: txt, html, docx, md | txt |
C++ CLI
The C++ CLI is a lightweight inference tool focused on text extraction. For document formatting (DOCX, HTML, etc.), use the Python CLI.
# Full OCR (detect + recognize)
./cpp/build/khmerocr image.png
# JSON output
./cpp/build/khmerocr -j image.png
# Detection only
./cpp/build/khmerocr -d image.png
# Recognition only (for pre-cropped text images)
./cpp/build/khmerocr -r cropped_text.png
# Verbose output with confidence scores
./cpp/build/khmerocr -v image.png
| Option | Shortcut | Description |
|--------|----------|-------------|
| --json | -j | Output results in JSON format |
| --detect-only | -d | Only detect text regions, skip recognition |
| --recognize-only | -r | Only recognize (skip detection) |
| --verbose | -v | Show confidence scores |
| --model-dir | -m | Custom model directory path |
Example Output
When processing a line, the engine provides rich metadata:
{
"text": "លទ្ធផលនៃការធ្វើកំណែទប្រង់លើទូរគមនាគមន៍កម្ពុជា",
"text_confidence": 0.9804,
"font": "Moul",
"font_confidence": 0.9999
}
Examples
| Input | Detected Text | Font Style | | ---------- | ---------------- | ---------- | | [Line 1] | យេម៉ែនលង់ក្នុងសង្គ្រាម... | Bold | | [Line 2] | ក្រសួងមហាផ្ទៃឱ្យត្រៀម... | Bold | | [Line 3] | លទ្ធផលនៃការធ្វើកំណែ... | Moul |
Milestones
- [x] Basic Font Style Detection
- [x] Multi-line Document Support
- [x] Export to DOCX/HTML/Markdown
- [ ] Add English & Symbol support
- [x] Add ONNXRuntime for faster inference
- [x] Add C/C++ Inference Engine
License
Distributed under the MIT License. See the LICENSE file for more information.
Contact
Seanghay Yath
- Email: seanghay.dev@gmail.com
- Telegram: @seanghay_yath
<div align="center"> <a href="[https://khmerscan.com/](https://khmerscan.com/)"> <img width="80" src="https://khmerscan.com/favicon.svg" alt="KhmerScan Logo"> </a> <p>Sponsored by <a href="[https://khmerscan.com/](https://khmerscan.com/)">KhmerScan</a>
(បម្លែងរូបភាពទៅជាអត្ថបទខ្មែរ)</p>
</div>Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
