RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Install / Use
/learn @KoljaB/RealtimeSTTREADME
RealtimeSTT
Easy-to-use, low-latency speech-to-text library for realtime applications
❗ Project Status: Community-Driven
This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.
I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!
New
- AudioToTextRecorderClient class, which automatically starts a server if none is running and connects to it. The class shares the same interface as AudioToTextRecorder, making it easy to upgrade or switch between the two. (Work in progress, most parameters and callbacks of AudioToTextRecorder are already implemented into AudioToTextRecorderClient, but not all. Also the server can not handle concurrent (parallel) requests yet.)
- reworked CLI interface ("stt-server" to start the server, "stt" to start the client, look at "server" folder for more info)
About the Project
RealtimeSTT listens to the microphone and transcribes voice into text.
Hint: <strong>Check out Linguflex</strong>, the original project from which RealtimeSTT is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.
It's ideal for:
- Voice Assistants
- Applications requiring fast and precise speech-to-text conversion
https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5
CLI demo code (reproduces the video above)
Updates
Latest Version: v0.3.104
See release history.
Hint: Since we use the
multiprocessingmodule now, ensure to include theif __name__ == '__main__':protection in your code to prevent unexpected behavior, especially on platforms like Windows. For a detailed explanation on why this is important, visit the official Python documentation onmultiprocessing.
Quick Examples
Print everything being said:
from RealtimeSTT import AudioToTextRecorder
def process_text(text):
print(text)
if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)
Type everything being said:
from RealtimeSTT import AudioToTextRecorder
import pyautogui
def process_text(text):
pyautogui.typewrite(text + " ")
if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)
Will type everything being said into your selected text box
Features
- Voice Activity Detection: Automatically detects when you start and stop speaking.
- Realtime Transcription: Transforms speech to text in real-time.
- Wake Word Activation: Can activate upon detecting a designated wake word.
Hint: Check out RealtimeTTS, the output counterpart of this library, for text-to-voice capabilities. Together, they form a powerful realtime audio wrapper around large language models.
Tech Stack
This library uses:
- Voice Activity Detection
- Speech-To-Text
- Faster_Whisper for instant (GPU-accelerated) transcription.
- Wake Word Detection
- Porcupine or OpenWakeWord for wake word detection.
These components represent the "industry standard" for cutting-edge applications, providing the most modern and effective foundation for building high-end solutions.
Installation
pip install RealtimeSTT
This will install all the necessary dependencies, including a CPU support only version of PyTorch.
Although it is possible to run RealtimeSTT with a CPU installation only (use a small model like "tiny" or "base" in this case) you will get way better experience using CUDA (please scroll down).
Linux Installation
Before installing RealtimeSTT please execute:
sudo apt-get update
sudo apt-get install python3-dev
sudo apt-get install portaudio19-dev
MacOS Installation
Before installing RealtimeSTT please execute:
brew install portaudio
GPU Support with CUDA (recommended)
Updating PyTorch for CUDA Support
To upgrade your PyTorch installation to enable GPU support with CUDA, follow these instructions based on your specific CUDA version. This is useful if you wish to enhance the performance of RealtimeSTT with CUDA capabilities.
For CUDA 11.8:
To update PyTorch and Torchaudio to support CUDA 11.8, use the following commands:
pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
For CUDA 12.X:
To update PyTorch and Torchaudio to support CUDA 12.X, execute the following:
pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
Replace 2.5.1 with the version of PyTorch that matches your system and requirements.
Steps That Might Be Necessary Before
Note: To check if your NVIDIA GPU supports CUDA, visit the official CUDA GPUs list.
If you didn't use CUDA models before, some additional steps might be needed one time before installation. These steps prepare the system for CUDA support and installation of the GPU-optimized installation. This is recommended for those who require better performance and have a compatible NVIDIA GPU. To use RealtimeSTT with GPU support via CUDA please also follow these steps:
-
Install NVIDIA CUDA Toolkit:
- select between CUDA 11.8 or CUDA 12.X Toolkit
- for 12.X visit NVIDIA CUDA Toolkit Archive and select latest version.
- for 11.8 visit NVIDIA CUDA Toolkit 11.8.
- Select operating system and version.
- Download and install the software.
- select between CUDA 11.8 or CUDA 12.X Toolkit
-
Install NVIDIA cuDNN:
- select between CUDA 11.8 or CUDA 12.X Toolkit
- for 12.X visit cuDNN Downloads.
- Select operating system and version.
- Download and install the software.
- for 11.8 visit NVIDIA cuDNN Archive.
- Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
- Download and install the software.
- for 12.X visit cuDNN Downloads.
- select between CUDA 11.8 or CUDA 12.X Toolkit
-
Install ffmpeg:
Note: Installation of ffmpeg might not actually be needed to operate RealtimeSTT <sup> *thanks to jgilbert2017 for pointing this out</sup>
You can download an installer for your OS from the ffmpeg Website.
Or use a package manager:
-
On Ubuntu or Debian:
sudo apt update && sudo apt install ffmpeg -
On Arch Linux:
sudo pacman -S ffmpeg -
On MacOS using Homebrew (https://brew.sh/):
brew install ffmpeg -
On Windows using Winget official documentation :
winget install Gyan.FFmpeg -
On Windows using Chocolatey (https://chocolatey.org/):
choco install ffmpeg -
On Windows using Scoop (https://scoop.sh/):
scoop install ffmpeg
-
Quick Start
Basic usage:
Manual Recording
Start and stop of recording are manually triggered.
recorder.start()
recorder.stop()
print(recorder.text())
Standalone Example:
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
recorder = AudioToTextRecorder()
recorder.start()
input("Press Enter to stop recording...")
recorder.stop()
print("Transcription: ", recorder.text())
Automatic Recording
Recording based on voice activity detection.
with AudioToTextRecorder() as recorder:
print(recorder.text())
Standalone Example:
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
with AudioToTextRecorder() as recorder:
print("Transcription: ", recorder.text())
When running recorder.text in a loop it is recommended to use a callback, allowing the transcription to be run asynchronously:
``
Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
83.8kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
339.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
