Koe
A zero-GUI macOS voice input tool. Press a hotkey, speak, and the corrected text is pasted into whatever app you're using.
Install / Use
/learn @missuo/KoeREADME
Koe (声)
A background-first macOS voice input tool. Press a hotkey, speak, and the corrected text is pasted into whatever app you're using.
For more information, visit the documentation at koe.li.
The Name
Koe (声, pronounced "ko-eh") is the Japanese word for voice. Written as こえ in hiragana, it's one of the most fundamental words in the language — simple, clear, and direct. That's exactly the philosophy behind this tool: your voice goes in, clean text comes out, with nothing in between. No flashy UI, no unnecessary steps. Just 声 — voice, in its purest form.
Why Koe?
I tried nearly every voice input app on the market. They were either paid, ugly, or inconvenient — bloated UIs, clunky dictionary management, and too many clicks to do simple things.
Koe takes a different approach:
- Minimal runtime UI. Koe stays out of the way with a menu bar item, a small floating status pill with native frosted-glass vibrancy during active sessions, and an optional built-in settings window when you actually need to configure it.
- All configuration lives in plain text files under
~/.koe/. You can edit them with any text editor, vim, a script, or the built-in settings UI. - Dictionary is a plain
.txtfile. No need to open an app and add words one by one through a GUI. Just edit~/.koe/dictionary.txt— one term per line. You can even use Claude Code or other AI tools to bulk-generate domain-specific terms. - Changes take effect immediately. Edit any config file and the new settings are used automatically. ASR, LLM, dictionary, and prompt changes apply on the next hotkey press. Hotkey changes are detected within a few seconds. No restart, no reload button.
- Tiny footprint. Even after installation, Koe stays under 15 MB, and its memory usage is typically around 20 MB. It launches fast, wastes almost no disk space, and stays out of your way.
- Built with native macOS technologies. Objective-C handles hotkeys, audio capture, clipboard access, permissions, and paste automation directly through Apple's own APIs.
- Rust does the heavy lifting. The performance-critical core runs in Rust, which gives Koe low overhead, fast execution, and strong memory safety guarantees.
- No Chromium tax. Many comparable Electron-based apps ship at 200+ MB and carry the overhead of an embedded Chromium runtime. Koe avoids that entire stack, which helps keep memory usage low and the app feeling lightweight.
How It Works
- Press and hold the trigger key (default: Fn, configurable) — Koe starts listening
- Audio streams in real-time to a cloud ASR service (Doubao/豆包 by ByteDance)
- A floating status pill shows real-time interim recognition text as you speak
- The ASR transcript is corrected by an LLM (any OpenAI-compatible API) — fixing capitalization, punctuation, spacing, and terminology
- The corrected text is automatically pasted into the active input field
ASR provider support:
- Cloud: Doubao (豆包) and Qwen (通义) streaming ASR
- Local: MLX (Apple Silicon, Qwen3-ASR models) and sherpa-onnx (CPU, streaming zipformer models)
- LLM: any OpenAI-compatible API for text correction
- Planned: future ASR support may include the OpenAI Transcriptions API
Installation
Koe's standard prebuilt path is still Apple Silicon first, but Intel Macs
can now build from source with the dedicated x86_64 target.
Homebrew
brew tap owo-network/brew
brew install owo-network/brew/koe
Release
You can also download the latest release directly from GitHub:
App Updates
Koe can check a JSON update feed hosted directly in this repository. The app reads the raw GitHub URL below and compares the published version with the running build:
APP_UPDATE_FEED_URL:https://raw.githubusercontent.com/missuo/koe/main/docs/update-feed.json
The feed file lives at docs/update-feed.json and should contain at least:
{
"version": "1.0.10",
"build": 11,
"download_url": "https://github.com/missuo/koe/releases/download/v1.0.10/Koe-macOS-arm64.zip"
}
Optional fields such as minimum_system_version, release_notes_url, published_at,
and notes can also be included. On launch, Koe checks this raw feed automatically,
checks again periodically, and you can also trigger a manual check from the menu bar
with Check for Updates.... When an update is found, Koe opens the release download
URL instead of patching the installed app in place.
Build from Source
Prerequisites
- macOS 14.0+ (13.0+ without MLX support)
- Apple Silicon or Intel Mac
- Rust toolchain (
rustup) - Xcode with command line tools
- xcodegen (
brew install xcodegen)
Build
git clone https://github.com/missuo/koe.git
cd koe
# Generate Xcode project
cd KoeApp && xcodegen && cd ..
# Build Apple Silicon
make build
# Build Intel
make build-x86_64
Run
make run
Or open the built app directly:
open ~/Library/Developer/Xcode/DerivedData/Koe-*/Build/Products/Release/Koe.app
Permissions
Koe requires three macOS permissions to function. You'll be prompted to grant them on first launch. All three are mandatory — without any one of them, Koe cannot complete its core workflow.
| Permission | Why it's needed | What happens without it |
|---|---|---|
| Microphone | Captures audio from your mic and streams it to the ASR service for speech recognition. | Koe cannot hear you at all. Recording will not start. |
| Accessibility | Simulates a Cmd+V keystroke to paste the corrected text into the active input field of any app. | Koe will still copy the text to your clipboard, but cannot auto-paste. You'll need to paste manually. |
| Input Monitoring | Listens for the trigger key (default: Fn, configurable) globally so Koe can detect when you press/release it, regardless of which app is in the foreground. | Koe cannot detect the hotkey. You won't be able to trigger recording. |
To grant permissions: System Settings → Privacy & Security → enable Koe under each of the three categories above.
Configuration
All config files live in ~/.koe/ and are auto-generated on first launch. You
can edit them directly, or use the built-in settings window (Setup Wizard) from
the menu bar. The settings window includes tabs for ASR, LLM, Controls, Dictionary,
and Prompt. When a local ASR provider (MLX or Sherpa-ONNX) is selected, the ASR
tab shows a model picker with download, status, and delete controls.
~/.koe/
├── config.yaml # Main configuration
├── dictionary.txt # User dictionary (hotwords + LLM correction)
├── history.db # Usage statistics (SQLite, auto-created)
├── system_prompt.txt # LLM system prompt (customizable)
├── user_prompt.txt # LLM user prompt template (customizable)
└── models/ # Local ASR models
├── mlx/
│ └── Qwen3-ASR-0.6B-4bit/
│ ├── .koe-manifest.json
│ └── *.safetensors, config.json, ...
└── sherpa-onnx/
└── bilingual-zh-en/
├── .koe-manifest.json
└── *.onnx, tokens.txt, ...
config.yaml
Below is the full configuration with explanations for every field.
ASR (Speech Recognition)
Koe uses a provider-based ASR config layout. Built-in providers: Doubao, Qwen, MLX (local, Apple Silicon), and sherpa-onnx (local, CPU).
asr:
# ASR provider: "doubao", "qwen", "mlx", "sherpa-onnx"
provider: "doubao"
doubao:
# WebSocket endpoint. Default uses ASR 2.0 optimized bidirectional streaming.
# Do not change unless you know what you're doing.
url: "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async"
# Volcengine credentials — get these from the 火山引擎 console.
# Go to: https://console.volcengine.com/speech/app → create an app → copy App ID and Access Token.
app_key: "" # X-Api-App-Key (火山引擎 App ID)
access_key: "" # X-Api-Access-Key (火山引擎 Access Token)
# Resource ID for billing. Default is the standard duration-based billing plan.
resource_id: "volc.seedasr.sauc.duration"
# Connection timeout in milliseconds. Increase if you have slow network.
connect_timeout_ms: 3000
# How long to wait for the final ASR result after you stop speaking (ms).
# If ASR doesn't return a final result within this time, the best available result is used.
final_wait_timeout_ms: 5000
# Disfluency removal (语义顺滑). Removes spoken repetitions and filler words like 嗯, 那个.
# Recommended: true. Set to false if you want raw transcription.
enable_ddc: true
# Inverse text normalization (文本规范化). Converts spoken numbers, dates, etc.
# e.g., "二零二四年" → "2024年", "百分之五十" → "50%"
# Recommended: true.
enable_itn: true
# Automatic punctuation. Inserts commas, periods, question marks, etc.
# Recommended: true.
enable_punc: true
# Two-pass recognition (二遍识别). First pass gives fast streaming results,
# second pass re-recognizes with higher accuracy. Slight latency increase (~200ms)
# but significantly better accuracy, especially for technical terms.
# Recommended: true.
enable_nonstream: true
# MLX local ASR (Apple Silicon only, requires model download)
mlx:
model: "mlx/Qwen3-ASR-0.6B-4bit" # relative to ~/.koe/models/, or absolute path
delay_preset: "realtime" # realtime | agent | subtitle
language: "auto" # auto | zh | en
# Sherpa-ONNX local ASR (CPU, requires model download)
sherpa-onnx:
model: "sherpa-onnx/bilingual-zh-en" # relative to ~/.koe/models/, or absolute path
num_threads: 2 # CPU inference threads
hotwords_score: 1.5 # dictionary term boost
endpoint_silenc
