KhmerOCR

A Fast Khmer Optical Character Recognition (KhmerOCR)

Generate Convert Improve

Install / Use

/learn @seanghay/KhmerOCR

About this skill

Quality Score

0/100

README

KhmerOCR

A high-performance Khmer Optical Character Recognition (OCR) engine tailored for documents. This model was trained on 3 million text lines using over 800+ Khmer fonts to ensure robust recognition across various styles and weights.

[!IMPORTANT] Update: The library now supports full document processing, layout detection, and multi-format exports (PDF, DOCX, HTML, Markdown).

Features

Fast: Optimized for Khmer script using ONNX Runtime for fast inference
Native C++ Engine: High-performance C/C++ implementation with C API for FFI bindings
Font Detection: Automatically identifies and preserves Moul vs. Regular font styles
Multi-format Export: Convert images or PDFs into editable .docx, .md, .html, or .txt files
PDF Support: High-resolution PDF rendering and processing via PyMuPDF
Cross-Platform: Supports macOS, Linux, Windows, iOS, and Android

Installation

Python

pip install git+https://github.com/seanghay/KhmerOCR

C++ Library

See cpp/README.md for build instructions.

cd cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Usage

Python CLI

For single images or documents, run:

khmerocr document.jpg --format docx

Python CLI Options

| Option | Shortcut | Description | Default | | ---------- | -------- | ------------------------------------------ | ------------------------- | | --output | -o | Custom output path | input_filename.{format} | | --format | -f | Output format: txt, html, docx, md | txt |

C++ CLI

The C++ CLI is a lightweight inference tool focused on text extraction. For document formatting (DOCX, HTML, etc.), use the Python CLI.

# Full OCR (detect + recognize)
./cpp/build/khmerocr image.png

# JSON output
./cpp/build/khmerocr -j image.png

# Detection only
./cpp/build/khmerocr -d image.png

# Recognition only (for pre-cropped text images)
./cpp/build/khmerocr -r cropped_text.png

# Verbose output with confidence scores
./cpp/build/khmerocr -v image.png

| Option | Shortcut | Description | |--------|----------|-------------| | --json | -j | Output results in JSON format | | --detect-only | -d | Only detect text regions, skip recognition | | --recognize-only | -r | Only recognize (skip detection) | | --verbose | -v | Show confidence scores | | --model-dir | -m | Custom model directory path |

Example Output

When processing a line, the engine provides rich metadata:

{
  "text": "លទ្ធផលនៃការធ្វើកំណែទប្រង់លើទូរគមនាគមន៍កម្ពុជា",
  "text_confidence": 0.9804,
  "font": "Moul",
  "font_confidence": 0.9999
}

Examples

| Input | Detected Text | Font Style | | ---------- | ---------------- | ---------- | | [Line 1] | យេម៉ែនលង់ក្នុងសង្គ្រាម... | Bold | | [Line 2] | ក្រសួងមហាផ្ទៃឱ្យត្រៀម... | Bold | | [Line 3] | លទ្ធផលនៃការធ្វើកំណែ... | Moul |

Milestones

[x] Basic Font Style Detection
[x] Multi-line Document Support
[x] Export to DOCX/HTML/Markdown
[ ] Add English & Symbol support
[x] Add ONNXRuntime for faster inference
[x] Add C/C++ Inference Engine

License

Distributed under the MIT License. See the LICENSE file for more information.

Contact

Seanghay Yath

Email: seanghay.dev@gmail.com
Telegram: @seanghay_yath

<div align="center"> <a href="[https://khmerscan.com/](https://khmerscan.com/)"> <img width="80" src="https://khmerscan.com/favicon.svg" alt="KhmerScan Logo"> </a> <p>Sponsored by <a href="[https://khmerscan.com/](https://khmerscan.com/)">KhmerScan</a>

(បម្លែងរូបភាពទៅជាអត្ថបទខ្មែរ)</p>

</div>

Related Skills

node-connect

346.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

346.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

346.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。