Speaches
No description available
Install / Use
/learn @speaches-ai/SpeachesREADME
Speaches
speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.
See the documentation for installation instructions and usage: speaches.ai
Features:
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with
speaches. - Audio generation (chat completions endpoint) | OpenAI Documentation
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- Text-to-Speech via
kokoro(Ranked #1 in the TTS Arena) andpipermodels. - GPU and CPU support.
- Deployable via Docker Compose / Docker
- Realtime API
- Highly configurable
Please create an issue if you find a bug, have a question, or a feature suggestion.
Demos
Realtime API
https://github.com/user-attachments/assets/457a736d-4c29-4b43-984b-05cc4d9995bc
(Excuse the breathing lol. Didn't have enough time to record a better demo)
Streaming Transcription
TODO
Speech Generation
https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b
Related Skills
node-connect
335.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
prose
335.8kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
frontend-design
82.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
sonoscli
335.8kControl Sonos speakers (discover/status/play/volume/group).
