Meikiocr
high-speed, high-accuracy, local ocr for japanese video games
Install / Use
/learn @rtr46/MeikiocrREADME
meikiocr
high-speed, high-accuracy, local ocr for japanese video games.
meikiocr is a python-based ocr pipeline that combines state-of-the-art detection and recognition models to provide an unparalleled open-source solution for extracting japanese text from video games and similar rendered content.
| original image | ocr result |
| :---: | :---: |
| |
|
ナルホド
こ、こんなにドキドキするの、
小学校の学級裁判のとき以来です。
live demo
the easiest way to see meikiocr in action is to try the live demo hosted on hugging face spaces. no installation required!
try the meikiocr live demo here
core features
- high accuracy: purpose-built and trained on japanese video game text,
meikiocrsignificantly outperforms general-purpose ocr tools like paddleocr or easyocr on this specific domain. - high speed: the architecture is pareto-optimal, delivering exceptional performance on both cpu and gpu.
- fully local & private: unlike cloud-based services,
meikiocrruns entirely on your machine, ensuring privacy and eliminating api costs or rate limits. - cross-platform: it works wherever onnx runtime runs, providing a much-needed local ocr solution for linux users.
- open & free: both the code and the underlying models are freely available under permissive licenses.
performance & benchmarks
meikiocr is built from two highly efficient models that establish a new pareto front for japanese text recognition. this means they offer a better accuracy/latency tradeoff than any other known open-weight model.
| detection (cpu) | detection (gpu) |
|:---:|:---:|
|
|
|
| recognition (cpu) | recognition (gpu) |
| :---: | :---: |
|
|
|
installation
pip install meikiocr
for nvidia gpu users (recommended)
for a massive performance boost, you can install the gpu-enabled version of the onnx runtime. this will be detected automatically by the script.
pip install meikiocr
pip uninstall onnxruntime
pip install onnxruntime-gpu
usage - cli
after installation, you can use the meikiocr tool directly from your terminal.
meikiocr image.png
options
- save visualization: draw bounding boxes and save the result to a file.
meikiocr image.png --output result.jpg - json output: get detailed results (coordinates, confidence scores) for integration with other scripts.
meikiocr image.png --json - adjust thresholds: fine-tune detection and recognition sensitivity.
meikiocr image.png --det-threshold 0.6 --rec-threshold 0.2
run meikiocr --help for a full list of available options.
usage - python
this is how meikiocr can be called. you can also run demo.py for additional visual output.
import cv2
import numpy as np
from urllib.request import urlopen
from meikiocr import MeikiOCR
IMAGE_URL = "https://huggingface.co/spaces/rtr46/meikiocr/resolve/main/example.jpg"
with urlopen(IMAGE_URL) as resp:
image = cv2.imdecode(np.asarray(bytearray(resp.read()), dtype="uint8"), cv2.IMREAD_COLOR)
ocr = MeikiOCR() # Initialize the OCR pipeline
results = ocr.run_ocr(image) # Run the full OCR pipeline
print('\n'.join([line['text'] for line in results if line['text']]))
adjusting thresholds
you can adjust the confidence thresholds for both the text line detection and the character recognition models. lowering the thresholds results in more detected text lines and characters, while higher values prevent false positives.
MeikiOCR().run_ocr(self, image, det_threshold=0.8, rec_threshold=0.2) # less, but more confident text boxes and characters returned
running dedicated detection
if you only care about the position of the text and not the content you can run the detection by itself, which is faster than running the whole ocr pipeline:
MeikiOCR().run_detection(self, image, det_threshold=0.8, rec_threshold=0.2) # only returns text line coordinates
in the same way you can also run_recognition by itself on images of precropped text lines.
how it works
meikiocr is a two-stage pipeline:
- text detection: the meiki.text.detect.v0 model first identifies the bounding boxes of all text lines in the image.
- text recognition: each detected text line is then cropped and processed in a batch by the meiki.text.recognition.v0 model, which recognizes the individual characters within it.
limitations
while meikiocr is state-of-the-art for its niche, it's important to understand its design constraints:
- domain specific: it is highly optimized for rendered text from video games and may not perform well on handwritten or complex real-world scene text.
- architectural limits: the detection model is capped at finding 64 text boxes, and the recognition model can process up to 48 characters per line. these limits are sufficient for over 99% of video game scenarios but may be a constraint for other use cases.
- vertical text line accuracy: vertical text line support is in beta. it should work, but dont expect the same level of accuracy as on horizontal lines.
advanced usage & potential
the meiki_ocr.py script provides a straightforward implementation of a post-processing pipeline that selects the most confident prediction for each character. however, the raw output from the recognition model is richer and can be used for more advanced applications. for example, one could build a language-aware post-processing step using n-grams to correct ocr mistakes by considering alternative character predictions.
this opens the door for meikiocr to be integrated into a variety of projects.
license
this project is licensed under the apache 2.0 license. see the license file for details.
Related Skills
qqbot-channel
352.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
claude-opus-4-5-migration
111.1kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
352.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
