Supertonic — Lightning Fast, On-Device TTS

Supertonic is a lightning-fast, on-device text-to-speech system designed for extreme performance with minimal computational overhead. Powered by ONNX Runtime, it runs entirely on your device—no cloud, no API calls, no privacy concerns.

📰 Update News

2026.01.22 - Voice Builder is now live! Turn your voice into a deployable, edge-native TTS with permanent ownership.

2026.01.06 - 🎉 Supertonic 2 released with multilingual support! Now supports English (en), Korean (ko), Spanish (es), Portuguese (pt), and French (fr). Demo | Models
2025.12.10 - Added supertonic PyPI package! Install via pip install supertonic. For details, visit supertonic-py documentation
2025.12.10 - Added 6 new voice styles (M3, M4, M5, F3, F4, F5). See Voices for details
2025.12.08 - Optimized ONNX models via OnnxSlim now available on Hugging Face Models
2025.11.24 - Added Flutter SDK support with macOS compatibility

Demo
Why Supertonic?
Language Support
Getting Started
Performance
Built with Supertonic
Citation
License

Demo

Raspberry Pi

Watch Supertonic running on a Raspberry Pi, demonstrating on-device, real-time text-to-speech synthesis:

https://github.com/user-attachments/assets/ea66f6d6-7bc5-4308-8a88-1ce3e07400d2

E-Reader

Experience Supertonic on an Onyx Boox Go 6 e-reader in airplane mode, achieving an average RTF of 0.3× with zero network dependency:

https://github.com/user-attachments/assets/64980e58-ad91-423a-9623-78c2ffc13680

Chrome Extension

Turns any webpage into audio in under one second, delivering lightning-fast, on-device text-to-speech with zero network dependency—free, private, and effortless:

https://github.com/user-attachments/assets/cc8a45fc-5c3e-4b2c-8439-a14c3d00d91c

🎧 Try it now: Experience Supertonic in your browser with our Interactive Demo, or get started with pre-trained models from Hugging Face Hub

Why Supertonic?

⚡ Blazingly Fast: Generates speech up to 167× faster than real-time on consumer hardware (M4 Pro)—unmatched by any other TTS system
🪶 Ultra Lightweight: Only 66M parameters, optimized for efficient on-device performance with minimal footprint
📱 On-Device Capable: Complete privacy and zero latency—all processing happens locally on your device
🎨 Natural Text Handling: Seamlessly processes numbers, dates, currency, abbreviations, and complex expressions without pre-processing
⚙️ Highly Configurable: Adjust inference steps, batch processing, and other parameters to match your specific needs
🧩 Flexible Deployment: Deploy seamlessly across servers, browsers, and edge devices with multiple runtime backends.

Language Support

We provide ready-to-use TTS inference examples across multiple ecosystems:

| Language/Platform | Path | Description | |-------------------|------|-------------| | Python | py/ | ONNX Runtime inference | | Node.js | nodejs/ | Server-side JavaScript | | Browser | web/ | WebGPU/WASM inference | | Java | java/ | Cross-platform JVM | | C++ | cpp/ | High-performance C++ | | C# | csharp/ | .NET ecosystem | | Go | go/ | Go implementation | | Swift | swift/ | macOS applications | | iOS | ios/ | Native iOS apps | | Rust | rust/ | Memory-safe systems | | Flutter | flutter/ | Cross-platform apps |

For detailed usage instructions, please refer to the README.md in each language directory.

Getting Started

First, clone the repository:

git clone https://github.com/supertone-inc/supertonic.git
cd supertonic

Prerequisites

Before running the examples, download the ONNX models and preset voices, and place them in the assets directory:

Note: The Hugging Face repository uses Git LFS. Please ensure Git LFS is installed and initialized before cloning or pulling large model files.

macOS: brew install git-lfs && git lfs install

Generic: see https://git-lfs.com for installers

git clone https://huggingface.co/Supertone/supertonic-2 assets

Quick Start

Python Example (Details)

cd py
uv sync
uv run example_onnx.py

Node.js Example (Details)

cd nodejs
npm install
npm start

Browser Example (Details)

cd web
npm install
npm run dev

Java Example (Details)

cd java
mvn clean install
mvn exec:java

C++ Example (Details)

cd cpp
mkdir build && cd build
cmake .. && cmake --build . --config Release
./example_onnx

C# Example (Details)

cd csharp
dotnet restore
dotnet run

Go Example (Details)

cd go
go mod download
go run example_onnx.go helper.go

Swift Example (Details)

cd swift
swift build -c release
.build/release/example_onnx

Rust Example (Details)

cd rust
cargo build --release
./target/release/example_onnx

iOS Example (Details)

cd ios/ExampleiOSApp
xcodegen generate
open ExampleiOSApp.xcodeproj

In Xcode: Targets → ExampleiOSApp → Signing: select your Team
Choose your iPhone as run destination → Build & Run

Technical Details

Runtime: ONNX Runtime for cross-platform inference (CPU-optimized; GPU mode is not tested)
Browser Support: onnxruntime-web for client-side inference
Batch Processing: Supports batch inference for improved throughput
Audio Output: Outputs 16-bit WAV files

Performance

We evaluated Supertonic's performance (with 2 inference steps) using two key metrics across input texts of varying lengths: Short (59 chars), Mid (152 chars), and Long (266 chars).

Metrics:

Characters per Second: Measures throughput by dividing the number of input characters by the time required to generate audio. Higher is better.
Real-time Factor (RTF): Measures the time taken to synthesize audio relative to its duration. Lower is better (e.g., RTF of 0.1 means it takes 0.1 seconds to generate one second of audio).

Characters per Second

| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | |--------|-----------------|----------------|-----------------| | Supertonic (M4 pro - CPU) | 912 | 1048 | 1263 | | Supertonic (M4 pro - WebGPU) | 996 | 1801 | 2509 | | Supertonic (RTX4090) | 2615 | 6548 | 12164 | | API ElevenLabs Flash v2.5 | 144 | 209 | 287 | | API OpenAI TTS-1 | 37 | 55 | 82 | | API Gemini 2.5 Flash TTS | 12 | 18 | 24 | | API Supertone Sona speech 1 | 38 | 64 | 92 | | Open Kokoro | 104 | 107 | 117 | | Open NeuTTS Air | 37 | 42 | 47 |

Notes:
API = Cloud-based API services (measured from Seoul)
Open = Open-source models
Supertonic (M4 pro - CPU) and (M4 pro - WebGPU): Tested with ONNX
Supertonic (RTX4090): Tested with PyTorch model
Kokoro: Tested on M4 Pro CPU with ONNX
NeuTTS Air: Tested on M4 Pro CPU with Q8-GGUF

Real-time Factor

| System | Short (59 chars) | Mid (152 chars) | Long (266 chars) | |--------|-----------------|----------------|-----------------| | Supertonic (M4 pro - CPU) | 0.015 | 0.013 | 0.012 | | Supertonic (M4 pro - WebGPU) | 0.014 | 0.007 | 0.006 | | Supertonic (RTX4090) | 0.005 | 0.002 | 0.001 | | API ElevenLabs Flash v2.5 | 0.133 | 0.077 | 0.057 | | API OpenAI TTS-1 | 0.471 | 0.302 | 0.201 | | API Gemini 2.5 Flash TTS | 1.060 | 0.673 | 0.541 | | API Supertone Sona speech 1 | 0.372 | 0.206 | 0.163 | | Open Kokoro | 0.144 | 0.124 | 0.126 | | Open NeuTTS Air | 0.390 | 0.338 | 0.343 |

<details> <summary>Additional Performance Data (5-step inference)</summary>

**Characters per Second (5-step

Supertonic

Install / Use

README