RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

Generate Convert Improve

Install / Use

/learn @KoljaB/RealtimeSTT

About this skill

Quality Score

0/100

README

RealtimeSTT

Easy-to-use, low-latency speech-to-text library for realtime applications

❗ Project Status: Community-Driven

This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.

I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!

New

AudioToTextRecorderClient class, which automatically starts a server if none is running and connects to it. The class shares the same interface as AudioToTextRecorder, making it easy to upgrade or switch between the two. (Work in progress, most parameters and callbacks of AudioToTextRecorder are already implemented into AudioToTextRecorderClient, but not all. Also the server can not handle concurrent (parallel) requests yet.)
reworked CLI interface ("stt-server" to start the server, "stt" to start the client, look at "server" folder for more info)

About the Project

RealtimeSTT listens to the microphone and transcribes voice into text.

Hint: <strong>Check out Linguflex</strong>, the original project from which RealtimeSTT is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.

It's ideal for:

Voice Assistants
Applications requiring fast and precise speech-to-text conversion

https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5

CLI demo code (reproduces the video above)

Updates

Latest Version: v0.3.104

See release history.

Hint: Since we use the multiprocessing module now, ensure to include the if __name__ == '__main__': protection in your code to prevent unexpected behavior, especially on platforms like Windows. For a detailed explanation on why this is important, visit the official Python documentation on multiprocessing.

Quick Examples

Print everything being said:

from RealtimeSTT import AudioToTextRecorder

def process_text(text):
    print(text)

if __name__ == '__main__':
    print("Wait until it says 'speak now'")
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)

Type everything being said:

from RealtimeSTT import AudioToTextRecorder
import pyautogui

def process_text(text):
    pyautogui.typewrite(text + " ")

if __name__ == '__main__':
    print("Wait until it says 'speak now'")
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)

Will type everything being said into your selected text box

Features

Voice Activity Detection: Automatically detects when you start and stop speaking.
Realtime Transcription: Transforms speech to text in real-time.
Wake Word Activation: Can activate upon detecting a designated wake word.

Hint: Check out RealtimeTTS, the output counterpart of this library, for text-to-voice capabilities. Together, they form a powerful realtime audio wrapper around large language models.

Tech Stack

This library uses:

Voice Activity Detection
- WebRTCVAD for initial voice activity detection.
- SileroVAD for more accurate verification.
Speech-To-Text
- Faster_Whisper for instant (GPU-accelerated) transcription.
Wake Word Detection
- Porcupine or OpenWakeWord for wake word detection.

These components represent the "industry standard" for cutting-edge applications, providing the most modern and effective foundation for building high-end solutions.

Installation

pip install RealtimeSTT

This will install all the necessary dependencies, including a CPU support only version of PyTorch.

Although it is possible to run RealtimeSTT with a CPU installation only (use a small model like "tiny" or "base" in this case) you will get way better experience using CUDA (please scroll down).

Linux Installation

Before installing RealtimeSTT please execute:

sudo apt-get update
sudo apt-get install python3-dev
sudo apt-get install portaudio19-dev

MacOS Installation

Before installing RealtimeSTT please execute:

brew install portaudio

GPU Support with CUDA (recommended)

Updating PyTorch for CUDA Support

To upgrade your PyTorch installation to enable GPU support with CUDA, follow these instructions based on your specific CUDA version. This is useful if you wish to enhance the performance of RealtimeSTT with CUDA capabilities.

For CUDA 11.8:

To update PyTorch and Torchaudio to support CUDA 11.8, use the following commands:

pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118

For CUDA 12.X:

To update PyTorch and Torchaudio to support CUDA 12.X, execute the following:

pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

Replace 2.5.1 with the version of PyTorch that matches your system and requirements.

Steps That Might Be Necessary Before

Note: To check if your NVIDIA GPU supports CUDA, visit the official CUDA GPUs list.

If you didn't use CUDA models before, some additional steps might be needed one time before installation. These steps prepare the system for CUDA support and installation of the GPU-optimized installation. This is recommended for those who require better performance and have a compatible NVIDIA GPU. To use RealtimeSTT with GPU support via CUDA please also follow these steps:

Install NVIDIA CUDA Toolkit:
- select between CUDA 11.8 or CUDA 12.X Toolkit
  - for 12.X visit NVIDIA CUDA Toolkit Archive and select latest version.
  - for 11.8 visit NVIDIA CUDA Toolkit 11.8.
- Select operating system and version.
- Download and install the software.
Install NVIDIA cuDNN:
- select between CUDA 11.8 or CUDA 12.X Toolkit
  - for 12.X visit cuDNN Downloads.
    - Select operating system and version.
    - Download and install the software.
  - for 11.8 visit NVIDIA cuDNN Archive.
    - Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
    - Download and install the software.
Install ffmpeg:

Note: Installation of ffmpeg might not actually be needed to operate RealtimeSTT <sup> *thanks to jgilbert2017 for pointing this out</sup>

You can download an installer for your OS from the ffmpeg Website.

Or use a package manager:
- On Ubuntu or Debian:
```
sudo apt update && sudo apt install ffmpeg
```
- On Arch Linux:
```
sudo pacman -S ffmpeg
```
- On MacOS using Homebrew (https://brew.sh/):
```
brew install ffmpeg
```
- On Windows using Winget official documentation :
```
winget install Gyan.FFmpeg
```
- On Windows using Chocolatey (https://chocolatey.org/):
```
choco install ffmpeg
```
- On Windows using Scoop (https://scoop.sh/):
```
scoop install ffmpeg
```

Quick Start

Basic usage:

Manual Recording

Start and stop of recording are manually triggered.

recorder.start()
recorder.stop()
print(recorder.text())

Standalone Example:

from RealtimeSTT import AudioToTextRecorder

if __name__ == '__main__':
    recorder = AudioToTextRecorder()
    recorder.start()
    input("Press Enter to stop recording...")
    recorder.stop()
    print("Transcription: ", recorder.text())

Automatic Recording

Recording based on voice activity detection.

with AudioToTextRecorder() as recorder:
    print(recorder.text())

Standalone Example:

from RealtimeSTT import AudioToTextRecorder

if __name__ == '__main__':
    with AudioToTextRecorder() as recorder:
        print("Transcription: ", recorder.text())

When running recorder.text in a loop it is recommended to use a callback, allowing the transcription to be run asynchronously:

Related Skills

node-connect

339.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

83.8k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

83.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

339.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

KoljaB

View profile

View on GitHub

GitHub Stars9.6k

CategoryDevelopment

Updated1h ago

Forks826

KoljaB/RealtimeSTT

Languages

Python

Security Score

100/100

Audited on Mar 28, 2026

No findings