Voipi
🔊 Give your apps, CLIs, and agents a voice. VoiPi is a universal, zero-dependency, free text-to-speech library for JavaScript.
Install / Use
/learn @pithings/VoipiREADME
Give your apps, CLIs, and agents a voice. VoiPi is a universal, zero-dependency, free text-to-speech library for JavaScript.
- Pure JS, Zero deps, Less than 100kB total install size and 10kB bundled providers
- No API keys required
- Multiple providers: Browser TTS, macOS, Edge TTS, Google TTS, Piper, eSpeak NG
- Auto fallback: Picks the best available provider per platform
- Auto language detection: Detects script (Arabic, Farsi, CJK, Cyrillic, etc.) and Latin-script languages (French, Spanish, German, Portuguese, etc.) — picks the best voice automatically
- MCP Server: Give AI agents a voice — auto install with Claude Code, Codex, Cursor, Windsurf, OpenCode and Pi.
Demo
<p align="center"> <a href="https://voipi.vercel.app/#samples"><img src="./website/demo.svg" alt="voipi demo" width="600"></a> <br> <a href="https://voipi.vercel.app/#samples"><img src="https://img.shields.io/badge/%F0%9F%94%8A_VoiPi-Listen_to_Samples-yellow?style=flat" alt="Listen to Samples" /></a> </p>CLI
You can use voipi directly with npx/pnpx/bunx.
# Speak text (auto-selects best available provider)
npx voipi 'The quick brown fox jumps over the lazy dog'
npx voipi speak 'Hello world'
# Choose a specific voice and speed
npx voipi 'Hi' -v en-US-BrianNeural -r 1.5
# Save to file instead of playing
npx voipi speak 'Hi' -o hello.mp3
# Use a specific provider
npx voipi 'Bonjour le monde' -p edge-tts -v fr-FR-DeniseNeural
# List available voices
npx voipi voices
# List voices for a specific provider
npx voipi voices -p edge-tts
# Start MCP server (stdio transport)
npx voipi mcp
MCP Server
VoiPi includes a built-in MCP server that exposes text-to-speech tools over the stdio transport. This lets AI agents and LLM clients speak text, save audio files, and list voices.
Auto-install to all detected agents:
npx voipi@latest mcp --install
Programmatic Usage
VoiPi automatically picks the best available provider with fallback chain (macOS → Edge TTS → Google TTS → Piper → eSpeak NG):
import { VoiPi } from "voipi";
const voice = new VoiPi();
// Speak text
await voice.speak("Hello world!");
// With a prioritized voice list (first available wins)
await voice.speak("Hello!", { voice: ["Samantha", "en-US-AriaNeural"], rate: 1.5 });
// Save to file
await voice.save("Hello!", "output.mp3");
// Get audio data with duration
const audio = await voice.toAudio("Hello world!");
console.log(`Duration: ${audio.duration}s`);
// List available voices
const voices = await voice.listVoices();
You can also provide a custom provider chain using names, [name, options] tuples, or factory functions:
import { VoiPi } from "voipi";
// Using provider names
const voice = new VoiPi({
providers: ["edge-tts", "macos"],
});
// Using [name, options] tuples for provider configuration
const voice2 = new VoiPi({
providers: [["edge-tts", { voice: "en-US-GuyNeural" }], "macos"],
});
// Using factory functions for full control
import { MacOS, EdgeTTS } from "voipi";
const voice3 = new VoiPi({
providers: [() => new EdgeTTS({ voice: "en-US-GuyNeural" }), () => new MacOS()],
});
Language Detection
VoiPi automatically detects the language of input text and selects an appropriate voice. This works across all providers — no manual voice selection needed for non-English text:
await voice.speak("سلام دنیا"); // Farsi → picks a Farsi voice
await voice.speak("مرحبا بالعالم"); // Arabic → picks an Arabic voice
await voice.speak("こんにちは"); // Japanese → picks a Japanese voice
await voice.speak("你好世界"); // Chinese → picks a Chinese voice
await voice.speak("L'éducation française est très appréciée"); // French → picks a French voice
await voice.speak("Straßenbahn und Gemütlichkeit"); // German → picks a German voice
await voice.speak("¿Cómo estás?"); // Spanish → picks a Spanish voice
Detects 30+ languages: unique scripts (Arabic, Farsi, Urdu, CJK, Cyrillic, Devanagari, etc.) and Latin-script languages via diacritics analysis (French, Spanish, German, Portuguese, Turkish, Polish, Czech, Romanian, Vietnamese, and more). You can also use the detection utility directly:
import { detectLanguage } from "voipi";
detectLanguage("سلام دنیا"); // "fa"
detectLanguage("Hello world"); // "en"
detectLanguage("こんにちは世界"); // "ja"
detectLanguage("L'éducation française"); // "fr"
detectLanguage("Straßenbahn"); // "de"
Duration Estimation
Estimate playback duration before or after synthesis:
import { estimateSpeechDuration, getAudioDuration } from "voipi";
// Pre-synthesis: estimate from text (~150 WPM heuristic)
const seconds = estimateSpeechDuration("Hello world!", 1.0);
// Post-synthesis: parse actual audio buffer (WAV/AIFF exact, MP3 estimated)
const audio = await voice.toAudio("Hello world!"); // duration auto-populated
console.log(audio.duration); // seconds
Providers
macOS
Uses the native say command. Only available on macOS.
import { MacOS } from "voipi/macos";
const voice = new MacOS({ voice: "Samantha", rate: 1.2 });
await voice.speak("Hello world!");
// Override defaults per call
await voice.speak("Hello!", { voice: "Daniel", rate: 1.5 });
Edge TTS
Cross-platform online TTS using Microsoft Edge's neural speech service. 322+ voices with configurable rate, pitch, and volume.
import { EdgeTTS } from "voipi/edge-tts";
const voice = new EdgeTTS({ voice: "en-US-AriaNeural" });
await voice.speak("Hello world!");
// List all available voices
const voices = await voice.listVoices();
Google TTS
Cross-platform online TTS using Google Translate's speech endpoint. 55+ languages, zero config.
import { GoogleTTS } from "voipi/google-tts";
const voice = new GoogleTTS({ voice: "en" });
await voice.speak("Hello world!");
// Different language
const fr = new GoogleTTS({ voice: "fr" });
await fr.speak("Bonjour le monde!");
Piper
Local neural TTS powered by Piper. 40+ languages, fully offline after first download. Uses an existing piper install if found in PATH, otherwise auto-installs a standalone binary (Linux x86_64/aarch64) or pip venv (macOS/Windows). Voice models (ONNX) are downloaded on demand from HuggingFace and cached locally.
import { Piper } from "voipi/piper";
const voice = new Piper();
await voice.speak("Hello world!");
// Custom voice, speed, and speaker
const voice2 = new Piper({ voice: "en_US-lessac-medium", lengthScale: 0.8, speaker: 0 });
await voice2.speak("Hello!");
// List all available voices
const voices = await voice.listVoices();
eSpeak NG
Local TTS using the eSpeak NG speech synthesizer. Requires espeak-ng installed on the system (available in KDE, etc). Supports 100+ languages with formant-based synthesis.
Note: It produces robotic-sounding output, for natural-sounding voices, prefer Piper which uses neural TTS.
import { EspeakNG } from "voipi/espeak-ng";
const voice = await EspeakNG.create();
await voice.speak("Hello world!");
// Custom voice and speed
const voice2 = await EspeakNG.create({ voice: "en-us+f3", rate: 1.2 });
await voice2.speak("Hello!");
// List all available voices
const voices = await voice.listVoices();
Browser TTS
Uses the Web Speech API (speechSynthesis). Works in browsers only — speaks directly without producing audio files.
import { BrowserTTS } from "voipi/browser";
const voice = new BrowserTTS();
await voice.speak("Hello world!");
// Pick a specific voice
await voice.speak("Hello!", { voice: "Google US English", rate: 1.2 });
// List available voices (varies by browser/OS)
const voices = await voice.listVoices();
Note: Browser TTS plays audio directly and does not support
save()or raw audio export.
Pi Extension
VoiPi also ships with a pi package that adds TTS tools and commands to pi.
pi install git:github.com/pithings/voipi
See packages/pi/README.md for usage details.
Sponsors
<p align="center"> <a href="https://sponsors.pi0.io/"> <img src="https://sponsors.pi0.io/sponsors.svg?xyz"> </a> </p>License
Published under the MIT license 💛.
