Izwi

On-device AI engine for transcription, TTS, and voice workflows.

Generate Convert Improve

Install / Use

/learn @izwi-ai/Izwi

About this skill

Quality Score

0/100

README

<img src="images/app-icon.png" alt="Izwi icon" width="140" /> <h1 align="center">Izwi</h1> Local-first audio inference engine for TTS, ASR, and voice AI workflows. <a href="https://izwiai.com">Website</a> • <a href="https://izwiai.com/docs">Documentation</a> • <a href="https://github.com/izwi-ai/izwi/releases">Releases</a> • <a href="https://izwiai.com/docs/getting-started">Getting Started</a> <img src="images/screenshot.png" alt="Izwi Screenshot" width="800" />

Overview

Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.

Core capabilities:

Voice Mode — Real-time voice conversations with AI
Text-to-Speech — Generate natural speech from text
Speech Recognition — Convert audio to text with high accuracy
Speaker Diarization — Identify and separate multiple speakers
Voice Cloning — Clone any voice from a short audio sample
Voice Design — Create custom voices from text descriptions
Forced Alignment — Word-level audio-text alignment
Chat — Text-based AI conversations

The server exposes OpenAI-compatible API routes under /v1.

Quick Install

macOS

Download the latest .dmg from GitHub Releases:

Open the .dmg file
Drag Izwi.app to Applications
Launch Izwi

Linux

wget https://github.com/izwi-ai/izwi/releases/latest/download/izwi_amd64.deb
sudo dpkg -i izwi_amd64.deb

Windows

Download and run the installer from GitHub Releases.

Full installation guides: macOS • Linux • Windows • From Source

Quick Start

1. Start the server

izwi serve

Open http://localhost:8080 in your browser.

2. Download a model

izwi pull Qwen3-TTS-12Hz-0.6B-Base

3. Generate speech

izwi tts "Hello from Izwi!" --output hello.wav

4. Transcribe audio

izwi pull Qwen3-ASR-0.6B
izwi transcribe audio.wav

Long-form ASR is handled automatically: Izwi now chunks long recordings, stitches overlapping transcripts, and returns a full transcript instead of only the first model window.

Optional tuning knobs:

IZWI_ASR_CHUNK_TARGET_SECS=24
IZWI_ASR_CHUNK_MAX_SECS=30
IZWI_ASR_CHUNK_OVERLAP_SECS=3

Supported Models

| Category | Models | |----------|--------| | TTS | Qwen3-TTS (0.6B, 1.7B), Kokoro-82M | | ASR | Qwen3-ASR (0.6B, 1.7B), Parakeet TDT | | Diarization | Sortformer 4-speaker | | Chat | Qwen3 (0.6B, 1.7B), Gemma 3 (1B, 4B) | | Alignment | Qwen3-ForcedAligner |

Run izwi list to see all available models.

Full model documentation: Models Guide

Documentation

| Resource | Link | |----------|------| | Getting Started | izwiai.com/docs/getting-started | | Installation | izwiai.com/docs/installation | | Features | izwiai.com/docs/features | | CLI Reference | izwiai.com/docs/cli | | Models | izwiai.com/docs/models | | Troubleshooting | izwiai.com/docs/troubleshooting |

License

Apache 2.0

Acknowledgments

Qwen3-TTS by Alibaba
Parakeet by NVIDIA
Gemma by Google
HuggingFace Hub for model hosting

Related Skills

node-connect

334.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

334.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.2k

Commit, push, and open a PR