SkillAgentSearch skills...

SpeechToSpeechLLM

speechToSpeechLLM - a Dockerized three-prong backend of KoboldCPP, CoquiTTS, and WhisperCPP. Example Rshiny application included for open source localized end-to-end speech to speech frame working

Install / Use

/learn @snakewizardd/SpeechToSpeechLLM
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

speechToSpeechLLM

A free, open-source implementation of Speech-to-Speech technology

image

To run the composite backend of

  • Kobold CPP (NeuralBeagle 7B) on port 5001
  • Coqui TTS on port 5002
  • WhisperCPP on port 8080

Run

chmod 555 run_entire_build.sh
./run_entire_build.sh

To stop

chmod 555 prune_entire_build.sh
./prune_entire_build.sh

The user-facing application right now is a POC, just a simple Rshiny app that interfaces between the backends. It is built for MacOS right now as it considers the inbuilt 'rec' command to record audio input.

A simple port can be modified for Windows using a software like ffmpeg. Still tbd for linux audio device recording.

All APIs run independently of the Rshiny app, which is NOT packaged with the docker compose build. Simply install R and the dependencies listed in /rshiny_deps Dockerfile to set up the environment for the front end. This is more a philosophical interlay of technologies than a true working POC

Initial Greeting speech input image

Follow up Message speech input image


NOTE: The only part of this build that seems to need a bit of troubleshooting is the Coqui image. if you have any latency issues when installing, feel free to use the build_coqui.sh script on its own to isolate the build. Hopefully we can fix this in a future build. Once you get the image built with the English model it should run no problem

View on GitHub
GitHub Stars7
CategoryDevelopment
Updated18d ago
Forks1

Languages

C++

Security Score

70/100

Audited on Mar 15, 2026

No findings