Koharu
ML-powered manga translator, written in Rust.
Install / Use
/learn @mayocream/KoharuREADME
Koharu
ML-powered manga translator, written in Rust.
Koharu introduces a local-first workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.
Under the hood, Koharu uses candle and llama.cpp for high-performance inference, with Tauri for the desktop app. All components are written in Rust, ensuring safety and speed.
[!NOTE] Koharu runs its vision models and LLMs locally on your machine to keep your data private and secure.

[!NOTE] Support and discussion are available on the Discord server.
Features
- Automatic detection of text regions, speech bubbles, and cleanup masks
- OCR for manga dialogue, captions, and other page text
- Inpainting to remove source lettering from the page
- Translation with local or remote LLM backends
- Advanced text rendering with vertical CJK and RTL support
- Layered PSD export with editable text
- Local HTTP API and MCP server for automation
For installation and first-run guidance, see Install Koharu and Translate Your First Page.
Usage
Hotkeys
- <kbd>Ctrl</kbd> + Mouse Wheel: Zoom in/out
- <kbd>Ctrl</kbd> + Drag: Pan the canvas
- <kbd>Del</kbd>: Delete selected text block
Export
Koharu can export the current page either as a flattened rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers, which is useful for downstream cleanup and manual refinement.
For export behavior, PSD contents, and file naming, see Export Pages and Manage Projects.
MCP Server
Koharu includes a built-in MCP server for local agent integrations. By default it listens on a random local port, but you can pin it with --port.
# macOS / Linux
koharu --port 9999
# Windows
koharu.exe --port 9999
Then point your client at http://localhost:9999/mcp.
For local setup and the available tools, see Run GUI, Headless, and MCP Modes, Configure MCP Clients, and MCP Tools Reference.
Headless Mode
Koharu can run without launching the desktop window.
# macOS / Linux
koharu --port 4000 --headless
# Windows
koharu.exe --port 4000 --headless
You can then connect to the web client at http://localhost:4000.
For runtime modes, ports, and local endpoints, see Run GUI, Headless, and MCP Modes.
Runtime Configuration
Koharu lets you configure the shared local data path plus HTTP connect timeout, read timeout, and retry count used by downloads and provider requests.
Those values are loaded at startup, so changing them saves the config and restarts the app.
Google Fonts
Koharu includes built-in Google Fonts support for translated text rendering, so you can use web fonts without managing font files by hand.
Google Fonts are fetched on demand from a bundled catalog. Koharu caches downloaded files under the app data directory and reuses them for later renders, so you usually only need an internet connection the first time a family is used on that machine.
The catalog includes a small set of comic-friendly recommended families. Once cached, a Google Font behaves like any other local render font.
Text Rendering
Koharu includes a dedicated text renderer tuned for manga lettering, using Unicode-aware OpenType shaping, script-aware line breaking, precise glyph metrics, and real glyph bounds instead of generic browser or OS text primitives.
It supports vertical CJK layout, right-to-left scripts, font fallback, vertical punctuation alignment, constrained-box fitting, and manga-oriented stroke and effect compositing so translated text reads naturally inside speech bubbles, captions, and other irregular page layouts.
GPU Acceleration
Koharu supports CUDA, experimental ZLUDA, Metal, and Vulkan. CPU fallback is always available when the accelerated path is unavailable or not worth the setup cost on your system.
CUDA (NVIDIA GPUs on Windows)
On Windows, Koharu ships with CUDA support so it can use NVIDIA GPUs for the full local pipeline.
Koharu bundles CUDA Toolkit 13.0. The required DLLs are extracted to the application data directory on first run.
[!NOTE] Make sure you have current NVIDIA drivers installed. You can update them through NVIDIA App.
Supported NVIDIA GPUs
Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.
For GPU compatibility references, see CUDA GPU Compute Capability.
ZLUDA (AMD GPUs on Windows, experimental)
Koharu supports experimental ZLUDA acceleration on Windows for AMD GPUs. ZLUDA is a CUDA compatibility layer that lets some CUDA workloads run on AMD GPUs.
To use it, install the AMD HIP SDK.
Metal (Apple Silicon on macOS)
Koharu supports Metal on Apple Silicon Macs. No extra runtime setup is required beyond a normal app install.
Vulkan (Windows and Linux)
Koharu also supports Vulkan on Windows and Linux. This backend is currently used primarily for OCR and local LLM inference.
Detection and inpainting still depend on CUDA, ZLUDA, or Metal, so Vulkan is useful but not a full replacement for the main accelerated path. AMD and Intel GPUs can still benefit from it.
CPU Fallback
You can always force Koharu to use CPU for inference:
# macOS / Linux
koharu --cpu
# Windows
koharu.exe --cpu
For backend selection, fallback behavior, and model runtime support, see Acceleration and Runtime.
ML Models
Koharu uses a staged stack of vision and language models instead of trying to solve the entire page with a single network.
Computer Vision Models
Koharu uses multiple pretrained models, each tuned for a specific part of the page pipeline.
Detection and Layout
These models find text regions, speech bubbles, and page structure.
- comic-text-bubble-detector for joint text block and speech bubble detection
- comic-text-detector for text segmentation masks
- PP-DocLayoutV3 for document layout analysis
- speech-bubble-segmentation for dedicated speech bubble detection
OCR
These models recognize source text after detection.
- PaddleOCR-VL-1.5 for OCR text recognition
- Manga OCR for OCR
- MIT 48px OCR for OCR
Inpainting
These models remove source lettering before translated text is rendered back onto the page.
- aot-inpainting for inpainting
- lama-manga for inpainting
Font Analysis
This model helps infer source font and color characteristics for rendering.
- YuzuMarker.FontDetection for font and color detection
The required models are downloaded automatically on first use.
Some models are consumed directly from upstream Hugging Face repos, while Rust-friendly safetensors conversions are hosted on Hugging Face when Koharu needs a converted bundle.
For a closer look at the pipeline, see Models and Providers and the Technical Deep Dive.
Large Language Models
Koharu supports both local and remote LLM backends. Local models run through llama.cpp and are downloaded on demand. Hosted and self-hosted APIs are also supported when you want to use a provider instead of a downloaded model. When possible, Koharu also tries to preselect sensible defaults based on your system locale.
General-Purpose Local Models
These are broad instruct models that work well when you want one local model for many translation tasks.
- Gemma 4 instruct: gemma4-e2b-it, gemma4-e4b-it, gemma4-26b-a4b-it, gemma4-31b-it
- Qwen 3.5: qwen3.5-0.8b, qwen3.5-2b, qwen3.5-4b, qwen3.5-9b, qwen3.5-27b, qwen3.5-35b-a3b
NSFW-Capable Local Models
These variants relax the safety tuning applied to the corresponding base instruct models.
- Gemma 4 uncensored: gemma4-e2b-uncensored, [gemma4-e4b-uncensored](https://huggingf
