SkillAgentSearch skills...

Speaches

No description available

Install / Use

/learn @speaches-ai/Speaches

README

Speaches

speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

See the documentation for installation instructions and usage: speaches.ai

Features:

  • OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with speaches.
  • Audio generation (chat completions endpoint) | OpenAI Documentation
    • Generate a spoken audio summary of a body of text (text in, audio out)
    • Perform sentiment analysis on a recording (audio in, text out)
    • Async speech to speech interactions with a model (audio in, audio out)
  • Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
  • Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
  • Text-to-Speech via kokoro(Ranked #1 in the TTS Arena) and piper models.
  • GPU and CPU support.
  • Deployable via Docker Compose / Docker
  • Realtime API
  • Highly configurable

Please create an issue if you find a bug, have a question, or a feature suggestion.

Demos

Realtime API

https://github.com/user-attachments/assets/457a736d-4c29-4b43-984b-05cc4d9995bc

(Excuse the breathing lol. Didn't have enough time to record a better demo)

Streaming Transcription

TODO

Speech Generation

https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b

Related Skills

View on GitHub
GitHub Stars3.1k
CategoryDevelopment
Updated1h ago
Forks378

Languages

Python

Security Score

95/100

Audited on Mar 25, 2026

No findings