Koe (声)

A background-first macOS voice input tool. Press a hotkey, speak, and the corrected text is pasted into whatever app you're using.

For more information, visit the documentation at koe.li.

The Name

Koe (声, pronounced "ko-eh") is the Japanese word for voice. Written as こえ in hiragana, it's one of the most fundamental words in the language — simple, clear, and direct. That's exactly the philosophy behind this tool: your voice goes in, clean text comes out, with nothing in between. No flashy UI, no unnecessary steps. Just 声 — voice, in its purest form.

Why Koe?

I tried nearly every voice input app on the market. They were either paid, ugly, or inconvenient — bloated UIs, clunky dictionary management, and too many clicks to do simple things.

Koe takes a different approach:

Minimal runtime UI. Koe stays out of the way with a menu bar item, a small floating status pill with native frosted-glass vibrancy during active sessions, and an optional built-in settings window when you actually need to configure it.
All configuration lives in plain text files under ~/.koe/. You can edit them with any text editor, vim, a script, or the built-in settings UI.
Dictionary is a plain .txt file. No need to open an app and add words one by one through a GUI. Just edit ~/.koe/dictionary.txt — one term per line. You can even use Claude Code or other AI tools to bulk-generate domain-specific terms.
Changes take effect immediately. Edit any config file and the new settings are used automatically. ASR, LLM, dictionary, and prompt changes apply on the next hotkey press. Hotkey changes are detected within a few seconds. No restart, no reload button.
Tiny footprint. Even after installation, Koe stays under 15 MB, and its memory usage is typically around 20 MB. It launches fast, wastes almost no disk space, and stays out of your way.
Built with native macOS technologies. Objective-C handles hotkeys, audio capture, clipboard access, permissions, and paste automation directly through Apple's own APIs.
Rust does the heavy lifting. The performance-critical core runs in Rust, which gives Koe low overhead, fast execution, and strong memory safety guarantees.
No Chromium tax. Many comparable Electron-based apps ship at 200+ MB and carry the overhead of an embedded Chromium runtime. Koe avoids that entire stack, which helps keep memory usage low and the app feeling lightweight.

How It Works

Press and hold the trigger key (default: Fn, configurable) — Koe starts listening
Audio streams in real-time to a cloud ASR service (Doubao/豆包 by ByteDance)
A floating status pill shows real-time interim recognition text as you speak
The ASR transcript is corrected by an LLM (any OpenAI-compatible API) — fixing capitalization, punctuation, spacing, and terminology
The corrected text is automatically pasted into the active input field

ASR provider support:

Cloud: Doubao (豆包) and Qwen (通义) streaming ASR
Local: MLX (Apple Silicon, Qwen3-ASR models) and sherpa-onnx (CPU, streaming zipformer models)
LLM: any OpenAI-compatible API for text correction
Planned: future ASR support may include the OpenAI Transcriptions API

Installation

Koe's standard prebuilt path is still Apple Silicon first, but Intel Macs can now build from source with the dedicated x86_64 target.

Homebrew

brew tap owo-network/brew
brew install owo-network/brew/koe

Release

You can also download the latest release directly from GitHub:

Download the latest release

App Updates

Koe can check a JSON update feed hosted directly in this repository. The app reads the raw GitHub URL below and compares the published version with the running build:

APP_UPDATE_FEED_URL: https://raw.githubusercontent.com/missuo/koe/main/docs/update-feed.json

The feed file lives at docs/update-feed.json and should contain at least:

{
  "version": "1.0.10",
  "build": 11,
  "download_url": "https://github.com/missuo/koe/releases/download/v1.0.10/Koe-macOS-arm64.zip"
}

Optional fields such as minimum_system_version, release_notes_url, published_at, and notes can also be included. On launch, Koe checks this raw feed automatically, checks again periodically, and you can also trigger a manual check from the menu bar with Check for Updates.... When an update is found, Koe opens the release download URL instead of patching the installed app in place.

Build from Source

Prerequisites

macOS 14.0+ (13.0+ without MLX support)
Apple Silicon or Intel Mac
Rust toolchain (rustup)
Xcode with command line tools
xcodegen (brew install xcodegen)

Build

git clone https://github.com/missuo/koe.git
cd koe

# Generate Xcode project
cd KoeApp && xcodegen && cd ..

# Build Apple Silicon
make build

# Build Intel
make build-x86_64

Run

make run

Or open the built app directly:

open ~/Library/Developer/Xcode/DerivedData/Koe-*/Build/Products/Release/Koe.app

Permissions

Koe requires three macOS permissions to function. You'll be prompted to grant them on first launch. All three are mandatory — without any one of them, Koe cannot complete its core workflow.

| Permission | Why it's needed | What happens without it | |---|---|---| | Microphone | Captures audio from your mic and streams it to the ASR service for speech recognition. | Koe cannot hear you at all. Recording will not start. | | Accessibility | Simulates a Cmd+V keystroke to paste the corrected text into the active input field of any app. | Koe will still copy the text to your clipboard, but cannot auto-paste. You'll need to paste manually. | | Input Monitoring | Listens for the trigger key (default: Fn, configurable) globally so Koe can detect when you press/release it, regardless of which app is in the foreground. | Koe cannot detect the hotkey. You won't be able to trigger recording. |

To grant permissions: System Settings → Privacy & Security → enable Koe under each of the three categories above.

Configuration

All config files live in ~/.koe/ and are auto-generated on first launch. You can edit them directly, or use the built-in settings window (Setup Wizard) from the menu bar. The settings window includes tabs for ASR, LLM, Controls, Dictionary, and Prompt. When a local ASR provider (MLX or Sherpa-ONNX) is selected, the ASR tab shows a model picker with download, status, and delete controls.

~/.koe/
├── config.yaml          # Main configuration
├── dictionary.txt       # User dictionary (hotwords + LLM correction)
├── history.db           # Usage statistics (SQLite, auto-created)
├── system_prompt.txt    # LLM system prompt (customizable)
├── user_prompt.txt      # LLM user prompt template (customizable)
└── models/              # Local ASR models
    ├── mlx/
    │   └── Qwen3-ASR-0.6B-4bit/
    │       ├── .koe-manifest.json
    │       └── *.safetensors, config.json, ...
    └── sherpa-onnx/
        └── bilingual-zh-en/
            ├── .koe-manifest.json
            └── *.onnx, tokens.txt, ...

config.yaml

Below is the full configuration with explanations for every field.

ASR (Speech Recognition)

Koe uses a provider-based ASR config layout. Built-in providers: Doubao, Qwen, MLX (local, Apple Silicon), and sherpa-onnx (local, CPU).

asr:
  # ASR provider: "doubao", "qwen", "mlx", "sherpa-onnx"
  provider: "doubao"

  doubao:
    # WebSocket endpoint. Default uses ASR 2.0 optimized bidirectional streaming.
    # Do not change unless you know what you're doing.
    url: "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async"

    # Volcengine credentials — get these from the 火山引擎 console.
    # Go to: https://console.volcengine.com/speech/app → create an app → copy App ID and Access Token.
    app_key: ""          # X-Api-App-Key (火山引擎 App ID)
    access_key: ""       # X-Api-Access-Key (火山引擎 Access Token)

    # Resource ID for billing. Default is the standard duration-based billing plan.
    resource_id: "volc.seedasr.sauc.duration"

    # Connection timeout in milliseconds. Increase if you have slow network.
    connect_timeout_ms: 3000

    # How long to wait for the final ASR result after you stop speaking (ms).
    # If ASR doesn't return a final result within this time, the best available result is used.
    final_wait_timeout_ms: 5000

    # Disfluency removal (语义顺滑). Removes spoken repetitions and filler words like 嗯, 那个.
    # Recommended: true. Set to false if you want raw transcription.
    enable_ddc: true

    # Inverse text normalization (文本规范化). Converts spoken numbers, dates, etc.
    # e.g., "二零二四年" → "2024年", "百分之五十" → "50%"
    # Recommended: true.
    enable_itn: true

    # Automatic punctuation. Inserts commas, periods, question marks, etc.
    # Recommended: true.
    enable_punc: true

    # Two-pass recognition (二遍识别). First pass gives fast streaming results,
    # second pass re-recognizes with higher accuracy. Slight latency increase (~200ms)
    # but significantly better accuracy, especially for technical terms.
    # Recommended: true.
    enable_nonstream: true

  # MLX local ASR (Apple Silicon only, requires model download)
  mlx:
    model: "mlx/Qwen3-ASR-0.6B-4bit"    # relative to ~/.koe/models/, or absolute path
    delay_preset: "realtime"              # realtime | agent | subtitle
    language: "auto"                      # auto | zh | en

  # Sherpa-ONNX local ASR (CPU, requires model download)
  sherpa-onnx:
    model: "sherpa-onnx/bilingual-zh-en"  # relative to ~/.koe/models/, or absolute path
    num_threads: 2                         # CPU inference threads
    hotwords_score: 1.5                    # dictionary term boost
    endpoint_silenc

Koe

Install / Use

README