Koharu

ML-powered manga translator, written in Rust.

Koharu introduces a local-first workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.

Under the hood, Koharu uses candle and llama.cpp for high-performance inference, with Tauri for the desktop app. All components are written in Rust, ensuring safety and speed.

[!NOTE] Koharu runs its vision models and LLMs locally on your machine to keep your data private and secure.

screenshot

[!NOTE] Support and discussion are available on the Discord server.

Features

Automatic detection of text regions, speech bubbles, and cleanup masks
OCR for manga dialogue, captions, and other page text
Inpainting to remove source lettering from the page
Translation with local or remote LLM backends
Advanced text rendering with vertical CJK and RTL support
Layered PSD export with editable text
Local HTTP API and MCP server for automation

For installation and first-run guidance, see Install Koharu and Translate Your First Page.

Usage

Hotkeys

<kbd>Ctrl</kbd> + Mouse Wheel: Zoom in/out
<kbd>Ctrl</kbd> + Drag: Pan the canvas
<kbd>Del</kbd>: Delete selected text block

Export

Koharu can export the current page either as a flattened rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers, which is useful for downstream cleanup and manual refinement.

For export behavior, PSD contents, and file naming, see Export Pages and Manage Projects.

MCP Server

Koharu includes a built-in MCP server for local agent integrations. By default it listens on a random local port, but you can pin it with --port.

# macOS / Linux
koharu --port 9999
# Windows
koharu.exe --port 9999

Then point your client at http://localhost:9999/mcp.

For local setup and the available tools, see Run GUI, Headless, and MCP Modes, Configure MCP Clients, and MCP Tools Reference.

Headless Mode

Koharu can run without launching the desktop window.

# macOS / Linux
koharu --port 4000 --headless
# Windows
koharu.exe --port 4000 --headless

You can then connect to the web client at http://localhost:4000.

For runtime modes, ports, and local endpoints, see Run GUI, Headless, and MCP Modes.

Runtime Configuration

Koharu lets you configure the shared local data path plus HTTP connect timeout, read timeout, and retry count used by downloads and provider requests.

Those values are loaded at startup, so changing them saves the config and restarts the app.

Google Fonts

Koharu includes built-in Google Fonts support for translated text rendering, so you can use web fonts without managing font files by hand.

Google Fonts are fetched on demand from a bundled catalog. Koharu caches downloaded files under the app data directory and reuses them for later renders, so you usually only need an internet connection the first time a family is used on that machine.

The catalog includes a small set of comic-friendly recommended families. Once cached, a Google Font behaves like any other local render font.

Text Rendering

Koharu includes a dedicated text renderer tuned for manga lettering, using Unicode-aware OpenType shaping, script-aware line breaking, precise glyph metrics, and real glyph bounds instead of generic browser or OS text primitives.

It supports vertical CJK layout, right-to-left scripts, font fallback, vertical punctuation alignment, constrained-box fitting, and manga-oriented stroke and effect compositing so translated text reads naturally inside speech bubbles, captions, and other irregular page layouts.

GPU Acceleration

Koharu supports CUDA, experimental ZLUDA, Metal, and Vulkan. CPU fallback is always available when the accelerated path is unavailable or not worth the setup cost on your system.

CUDA (NVIDIA GPUs on Windows)

On Windows, Koharu ships with CUDA support so it can use NVIDIA GPUs for the full local pipeline.

Koharu bundles CUDA Toolkit 13.0. The required DLLs are extracted to the application data directory on first run.

[!NOTE] Make sure you have current NVIDIA drivers installed. You can update them through NVIDIA App.

Supported NVIDIA GPUs

Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.

For GPU compatibility references, see CUDA GPU Compute Capability.

ZLUDA (AMD GPUs on Windows, experimental)

Koharu supports experimental ZLUDA acceleration on Windows for AMD GPUs. ZLUDA is a CUDA compatibility layer that lets some CUDA workloads run on AMD GPUs.

To use it, install the AMD HIP SDK.

Metal (Apple Silicon on macOS)

Koharu supports Metal on Apple Silicon Macs. No extra runtime setup is required beyond a normal app install.

Vulkan (Windows and Linux)

Koharu also supports Vulkan on Windows and Linux. This backend is currently used primarily for OCR and local LLM inference.

Detection and inpainting still depend on CUDA, ZLUDA, or Metal, so Vulkan is useful but not a full replacement for the main accelerated path. AMD and Intel GPUs can still benefit from it.

CPU Fallback

You can always force Koharu to use CPU for inference:

# macOS / Linux
koharu --cpu
# Windows
koharu.exe --cpu

For backend selection, fallback behavior, and model runtime support, see Acceleration and Runtime.

ML Models

Koharu uses a staged stack of vision and language models instead of trying to solve the entire page with a single network.

Computer Vision Models

Koharu uses multiple pretrained models, each tuned for a specific part of the page pipeline.

Detection and Layout

These models find text regions, speech bubbles, and page structure.

comic-text-bubble-detector for joint text block and speech bubble detection
comic-text-detector for text segmentation masks
PP-DocLayoutV3 for document layout analysis
speech-bubble-segmentation for dedicated speech bubble detection

OCR

These models recognize source text after detection.

PaddleOCR-VL-1.5 for OCR text recognition
Manga OCR for OCR
MIT 48px OCR for OCR

Inpainting

These models remove source lettering before translated text is rendered back onto the page.

aot-inpainting for inpainting
lama-manga for inpainting

Font Analysis

This model helps infer source font and color characteristics for rendering.

YuzuMarker.FontDetection for font and color detection

The required models are downloaded automatically on first use.

Some models are consumed directly from upstream Hugging Face repos, while Rust-friendly safetensors conversions are hosted on Hugging Face when Koharu needs a converted bundle.

For a closer look at the pipeline, see Models and Providers and the Technical Deep Dive.

Large Language Models

Koharu supports both local and remote LLM backends. Local models run through llama.cpp and are downloaded on demand. Hosted and self-hosted APIs are also supported when you want to use a provider instead of a downloaded model. When possible, Koharu also tries to preselect sensible defaults based on your system locale.

General-Purpose Local Models

These are broad instruct models that work well when you want one local model for many translation tasks.

Gemma 4 instruct: gemma4-e2b-it, gemma4-e4b-it, gemma4-26b-a4b-it, gemma4-31b-it
Qwen 3.5: qwen3.5-0.8b, qwen3.5-2b, qwen3.5-4b, qwen3.5-9b, qwen3.5-27b, qwen3.5-35b-a3b

NSFW-Capable Local Models

These variants relax the safety tuning applied to the corresponding base instruct models.

Gemma 4 uncensored: gemma4-e2b-uncensored, [gemma4-e4b-uncensored](https://huggingf

Koharu

Install / Use

README

Koharu

Features

Usage

Hotkeys

Export

MCP Server

Headless Mode

Runtime Configuration

Google Fonts

Text Rendering

GPU Acceleration

CUDA (NVIDIA GPUs on Windows)

Supported NVIDIA GPUs

ZLUDA (AMD GPUs on Windows, experimental)

Metal (Apple Silicon on macOS)

Vulkan (Windows and Linux)

CPU Fallback

ML Models

Computer Vision Models

Detection and Layout

OCR

Inpainting

Font Analysis

Large Language Models

General-Purpose Local Models

NSFW-Capable Local Models