KittenTTS
State-of-the-art TTS model under 25MB 😻
Install / Use
/learn @KittenML/KittenTTSREADME
Kitten TTS
<p align="center"> <img width="607" height="255" alt="Kitten TTS" src="https://github.com/user-attachments/assets/f4646722-ba78-4b25-8a65-81bacee0d4f6" /> </p> <p align="center"> <a href="https://huggingface.co/spaces/KittenML/KittenTTS-Demo"><img src="https://img.shields.io/badge/Demo-Hugging%20Face%20Spaces-orange" alt="Hugging Face Demo"></a> <a href="https://discord.com/invite/VJ86W4SURW"><img src="https://img.shields.io/badge/Discord-Join%20Community-5865F2?logo=discord&logoColor=white" alt="Discord"></a> <a href="https://kittenml.com"><img src="https://img.shields.io/badge/Website-kittenml.com-blue" alt="Website"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-green.svg" alt="License"></a> </p>New: Kitten TTS v0.8 is out -- 15M, 40M, and 80M parameter models now available.
Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX. With models ranging from 15M to 80M parameters (25-80 MB on disk), it delivers high-quality voice synthesis on CPU without requiring a GPU.
Status: Developer preview -- APIs may change between releases.
Commercial support is available. For integration assistance, custom voices, or enterprise licensing, contact us.
Table of Contents
- Features
- Available Models
- Demo
- Quick Start
- API Reference
- System Requirements
- Roadmap
- Commercial Support
- Community and Support
- License
Features
- Ultra-lightweight -- Model sizes from 25 MB (int8) to 80 MB, suitable for edge deployment
- CPU-optimized -- ONNX-based inference runs efficiently without a GPU
- 8 built-in voices -- Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo
- Adjustable speech speed -- Control playback rate via the
speedparameter - Text preprocessing -- Built-in pipeline handles numbers, currencies, units, and more
- 24 kHz output -- High-quality audio at a standard sample rate
Available Models
| Model | Parameters | Size | Download | |---|---|---|---| | kitten-tts-mini | 80M | 80 MB | KittenML/kitten-tts-mini-0.8 | | kitten-tts-micro | 40M | 41 MB | KittenML/kitten-tts-micro-0.8 | | kitten-tts-nano | 15M | 56 MB | KittenML/kitten-tts-nano-0.8 | | kitten-tts-nano (int8) | 15M | 25 MB | KittenML/kitten-tts-nano-0.8-int8 |
Note: Some users have reported issues with the
kitten-tts-nano-0.8-int8model. If you encounter problems, please open an issue.
Demo
https://github.com/user-attachments/assets/d80120f2-c751-407e-a166-068dd1dd9e8d
Try it online
Try Kitten TTS directly in your browser on Hugging Face Spaces.
Quick Start
Prerequisites
- Python 3.8 or later
- pip
Installation
pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl
Basic Usage
from kittentts import KittenTTS
model = KittenTTS("KittenML/kitten-tts-mini-0.8")
audio = model.generate("This high-quality TTS model runs without a GPU.", voice="Jasper")
import soundfile as sf
sf.write("output.wav", audio, 24000)
Advanced Usage
# Adjust speech speed (default: 1.0)
audio = model.generate("Hello, world.", voice="Luna", speed=1.2)
# Save directly to a file
model.generate_to_file("Hello, world.", "output.wav", voice="Bruno", speed=0.9)
# List available voices
print(model.available_voices)
# ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']
API Reference
KittenTTS(model_name, cache_dir=None)
Load a model from Hugging Face Hub.
| Parameter | Type | Default | Description |
|---|---|---|---|
| model_name | str | "KittenML/kitten-tts-nano-0.8" | Hugging Face repository ID |
| cache_dir | str | None | Local directory for caching downloaded model files |
model.generate(text, voice, speed, clean_text)
Synthesize speech from text, returning a NumPy array of audio samples at 24 kHz.
| Parameter | Type | Default | Description |
|---|---|---|---|
| text | str | -- | Input text to synthesize |
| voice | str | "expr-voice-5-m" | Voice name (see available voices) |
| speed | float | 1.0 | Speech speed multiplier |
| clean_text | bool | False | Preprocess text (expand numbers, currencies, etc.) |
model.generate_to_file(text, output_path, voice, speed, sample_rate, clean_text)
Synthesize speech and write directly to an audio file.
| Parameter | Type | Default | Description |
|---|---|---|---|
| text | str | -- | Input text to synthesize |
| output_path | str | -- | Path to save the audio file |
| voice | str | "expr-voice-5-m" | Voice name |
| speed | float | 1.0 | Speech speed multiplier |
| sample_rate | int | 24000 | Audio sample rate in Hz |
| clean_text | bool | True | Preprocess text (expand numbers, currencies, etc.) |
model.available_voices
Returns a list of available voice names: ['Bella', 'Jasper', 'Luna', 'Bruno', 'Rosie', 'Hugo', 'Kiki', 'Leo']
System Requirements
- Operating system: Linux, macOS, or Windows
- Python: 3.8 or later
- Hardware: Runs on CPU; no GPU required
- Disk space: 25-80 MB depending on model variant
A virtual environment (conda, venv, or similar) is recommended to avoid dependency conflicts.
Roadmap
- [ ] Release optimized inference engine
- [ ] Release mobile SDK
- [ ] Release higher quality TTS models
- [ ] Release multilingual TTS
- [ ] Release KittenASR
- [ ] Need anything else? Let us know
Commercial Support
We offer commercial support for teams integrating Kitten TTS into their products. This includes integration assistance, custom voice development, and enterprise licensing.
Contact us or email info@stellonlabs.com to discuss your requirements.
Community and Support
- Discord: Join the community
- Website: kittenml.com
- Custom support: Request form
- Email: info@stellonlabs.com
- Issues: GitHub Issues
License
This project is licensed under the Apache License 2.0.
