SkillAgentSearch skills...

RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

Install / Use

/learn @KoljaB/RealtimeSTT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

RealtimeSTT

PyPI Downloads GitHub release GitHub commits GitHub forks GitHub stars

Easy-to-use, low-latency speech-to-text library for realtime applications

Project Status: Community-Driven

This project is no longer being actively maintained by me due to time constraints. I've taken on too many projects and I have to step back. I will no longer be implementing new features or providing user support.

I will continue to review and merge high-quality, well-written Pull Requests from the community from time to time. Your contributions are welcome and appreciated!

New

  • AudioToTextRecorderClient class, which automatically starts a server if none is running and connects to it. The class shares the same interface as AudioToTextRecorder, making it easy to upgrade or switch between the two. (Work in progress, most parameters and callbacks of AudioToTextRecorder are already implemented into AudioToTextRecorderClient, but not all. Also the server can not handle concurrent (parallel) requests yet.)
  • reworked CLI interface ("stt-server" to start the server, "stt" to start the client, look at "server" folder for more info)

About the Project

RealtimeSTT listens to the microphone and transcribes voice into text.

Hint: <strong>Check out Linguflex</strong>, the original project from which RealtimeSTT is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.

It's ideal for:

  • Voice Assistants
  • Applications requiring fast and precise speech-to-text conversion

https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5

CLI demo code (reproduces the video above)

Updates

Latest Version: v0.3.104

See release history.

Hint: Since we use the multiprocessing module now, ensure to include the if __name__ == '__main__': protection in your code to prevent unexpected behavior, especially on platforms like Windows. For a detailed explanation on why this is important, visit the official Python documentation on multiprocessing.

Quick Examples

Print everything being said:

from RealtimeSTT import AudioToTextRecorder

def process_text(text):
    print(text)

if __name__ == '__main__':
    print("Wait until it says 'speak now'")
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)

Type everything being said:

from RealtimeSTT import AudioToTextRecorder
import pyautogui

def process_text(text):
    pyautogui.typewrite(text + " ")

if __name__ == '__main__':
    print("Wait until it says 'speak now'")
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)

Will type everything being said into your selected text box

Features

  • Voice Activity Detection: Automatically detects when you start and stop speaking.
  • Realtime Transcription: Transforms speech to text in real-time.
  • Wake Word Activation: Can activate upon detecting a designated wake word.

Hint: Check out RealtimeTTS, the output counterpart of this library, for text-to-voice capabilities. Together, they form a powerful realtime audio wrapper around large language models.

Tech Stack

This library uses:

These components represent the "industry standard" for cutting-edge applications, providing the most modern and effective foundation for building high-end solutions.

Installation

pip install RealtimeSTT

This will install all the necessary dependencies, including a CPU support only version of PyTorch.

Although it is possible to run RealtimeSTT with a CPU installation only (use a small model like "tiny" or "base" in this case) you will get way better experience using CUDA (please scroll down).

Linux Installation

Before installing RealtimeSTT please execute:

sudo apt-get update
sudo apt-get install python3-dev
sudo apt-get install portaudio19-dev

MacOS Installation

Before installing RealtimeSTT please execute:

brew install portaudio

GPU Support with CUDA (recommended)

Updating PyTorch for CUDA Support

To upgrade your PyTorch installation to enable GPU support with CUDA, follow these instructions based on your specific CUDA version. This is useful if you wish to enhance the performance of RealtimeSTT with CUDA capabilities.

For CUDA 11.8:

To update PyTorch and Torchaudio to support CUDA 11.8, use the following commands:

pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118

For CUDA 12.X:

To update PyTorch and Torchaudio to support CUDA 12.X, execute the following:

pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

Replace 2.5.1 with the version of PyTorch that matches your system and requirements.

Steps That Might Be Necessary Before

Note: To check if your NVIDIA GPU supports CUDA, visit the official CUDA GPUs list.

If you didn't use CUDA models before, some additional steps might be needed one time before installation. These steps prepare the system for CUDA support and installation of the GPU-optimized installation. This is recommended for those who require better performance and have a compatible NVIDIA GPU. To use RealtimeSTT with GPU support via CUDA please also follow these steps:

  1. Install NVIDIA CUDA Toolkit:

  2. Install NVIDIA cuDNN:

    • select between CUDA 11.8 or CUDA 12.X Toolkit
      • for 12.X visit cuDNN Downloads.
        • Select operating system and version.
        • Download and install the software.
      • for 11.8 visit NVIDIA cuDNN Archive.
        • Click on "Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
        • Download and install the software.
  3. Install ffmpeg:

    Note: Installation of ffmpeg might not actually be needed to operate RealtimeSTT <sup> *thanks to jgilbert2017 for pointing this out</sup>

    You can download an installer for your OS from the ffmpeg Website.

    Or use a package manager:

Quick Start

Basic usage:

Manual Recording

Start and stop of recording are manually triggered.

recorder.start()
recorder.stop()
print(recorder.text())

Standalone Example:

from RealtimeSTT import AudioToTextRecorder

if __name__ == '__main__':
    recorder = AudioToTextRecorder()
    recorder.start()
    input("Press Enter to stop recording...")
    recorder.stop()
    print("Transcription: ", recorder.text())

Automatic Recording

Recording based on voice activity detection.

with AudioToTextRecorder() as recorder:
    print(recorder.text())

Standalone Example:

from RealtimeSTT import AudioToTextRecorder

if __name__ == '__main__':
    with AudioToTextRecorder() as recorder:
        print("Transcription: ", recorder.text())

When running recorder.text in a loop it is recommended to use a callback, allowing the transcription to be run asynchronously:

``

Related Skills

View on GitHub
GitHub Stars9.6k
CategoryDevelopment
Updated1h ago
Forks826

Languages

Python

Security Score

100/100

Audited on Mar 28, 2026

No findings