SkillAgentSearch skills...

Izwi

On-device AI engine for transcription, TTS, and voice workflows.

Install / Use

/learn @izwi-ai/Izwi

README

<p align="center"> <img src="images/app-icon.png" alt="Izwi icon" width="140" /> </p> <h1 align="center">Izwi</h1> <p align="center"><strong>Local-first audio inference engine for TTS, ASR, and voice AI workflows.</strong></p> <p align="center"> <a href="https://izwiai.com">Website</a> • <a href="https://izwiai.com/docs">Documentation</a> • <a href="https://github.com/izwi-ai/izwi/releases">Releases</a> • <a href="https://izwiai.com/docs/getting-started">Getting Started</a> </p> <p align="center"> <img src="images/screenshot.png" alt="Izwi Screenshot" width="800" /> </p>

Overview

Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.

Core capabilities:

  • Voice Mode — Real-time voice conversations with AI
  • Text-to-Speech — Generate natural speech from text
  • Speech Recognition — Convert audio to text with high accuracy
  • Speaker Diarization — Identify and separate multiple speakers
  • Voice Cloning — Clone any voice from a short audio sample
  • Voice Design — Create custom voices from text descriptions
  • Forced Alignment — Word-level audio-text alignment
  • Chat — Text-based AI conversations

The server exposes OpenAI-compatible API routes under /v1.


Quick Install

macOS

Download the latest .dmg from GitHub Releases:

  1. Open the .dmg file
  2. Drag Izwi.app to Applications
  3. Launch Izwi

Linux

wget https://github.com/izwi-ai/izwi/releases/latest/download/izwi_amd64.deb
sudo dpkg -i izwi_amd64.deb

Windows

Download and run the installer from GitHub Releases.

Full installation guides: macOSLinuxWindowsFrom Source


Quick Start

1. Start the server

izwi serve

Open http://localhost:8080 in your browser.

2. Download a model

izwi pull Qwen3-TTS-12Hz-0.6B-Base

3. Generate speech

izwi tts "Hello from Izwi!" --output hello.wav

4. Transcribe audio

izwi pull Qwen3-ASR-0.6B
izwi transcribe audio.wav

Long-form ASR is handled automatically: Izwi now chunks long recordings, stitches overlapping transcripts, and returns a full transcript instead of only the first model window.

Optional tuning knobs:

IZWI_ASR_CHUNK_TARGET_SECS=24
IZWI_ASR_CHUNK_MAX_SECS=30
IZWI_ASR_CHUNK_OVERLAP_SECS=3

Supported Models

| Category | Models | |----------|--------| | TTS | Qwen3-TTS (0.6B, 1.7B), Kokoro-82M | | ASR | Qwen3-ASR (0.6B, 1.7B), Parakeet TDT | | Diarization | Sortformer 4-speaker | | Chat | Qwen3 (0.6B, 1.7B), Gemma 3 (1B, 4B) | | Alignment | Qwen3-ForcedAligner |

Run izwi list to see all available models.

Full model documentation: Models Guide


Documentation

| Resource | Link | |----------|------| | Getting Started | izwiai.com/docs/getting-started | | Installation | izwiai.com/docs/installation | | Features | izwiai.com/docs/features | | CLI Reference | izwiai.com/docs/cli | | Models | izwiai.com/docs/models | | Troubleshooting | izwiai.com/docs/troubleshooting |


License

Apache 2.0

Acknowledgments

Related Skills

View on GitHub
GitHub Stars234
CategoryDevelopment
Updated3h ago
Forks26

Languages

Rust

Security Score

85/100

Audited on Mar 24, 2026

No findings