Vibevoice
Fast local speech-to-text for any app using faster-whisper
Install / Use
/learn @mpaepper/VibevoiceREADME
Vibevoice 🎙️
Hi, I'm Marc Päpper and I wanted to vibe code like Karpathy ;D, so I looked around and found the cool work of Vlad. I extended it to run with a local whisper model, so I don't need to pay for OpenAI tokens. I hope you have fun with it!
What it does 🚀

Simply run cli.py and start dictating text anywhere in your system:
- Hold down right control key (Ctrl_r)
- Speak your text
- Release the key
- Watch as your spoken words are transcribed and automatically typed!
Works in any application or window - your text editor, browser, chat apps, anywhere you can type!
NEW: LLM voice command mode:
- Hold down the scroll_lock key (I think it's normally not used anymore that's why I chose it)
- Speak what you want the LLM to do
- The LLM receives your transcribed text and a screenshot of your current view
- The LLM answer is typed into your keyboard (streamed)
Works everywhere on your system and the LLM always has the screen context
Installation 🛠️
git clone https://github.com/mpaepper/vibevoice.git
cd vibevoice
pip install -r requirements.txt
python src/vibevoice/cli.py
Requirements 📋
Python Dependencies
- Python 3.13 or higher
System Requirements
- CUDA-capable GPU (recommended) -> in server.py you can enable cpu use
- CUDA 12.x
- cuBLAS
- cuDNN 9.x
- In case you get this error:
OSError: PortAudio library not foundrunsudo apt install libportaudio2 - Ollama for AI command mode (with multimodal models for screenshot support)
Setting up Ollama
- Install Ollama by following the instructions at ollama.com
- Pull a model that supports both text and images for best results:
ollama pull gemma3:27b # Great model which can run on RTX 3090 or similar - Make sure Ollama is running in the background:
ollama serve
Handling the CUDA requirements
- Make sure that you have CUDA >= 12.4 and cuDNN >= 9.x
- I had some trouble at first with Ubuntu 24.04, so I did the following:
- Attention: DO NOT do this if your are a WSL user (https://docs.nvidia.com/cuda/wsl-user-guide/index.html)
sudo apt update && sudo apt upgrade
sudo apt autoremove nvidia* --purge
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb && sudo apt update
sudo apt install cuda-toolkit-12-8
or alternatively:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cudnn9-cuda-12
- Then after rebooting, it worked well.
Usage 💡
- Start the application:
python src/vibevoice/cli.py
- Hold down right control key (Ctrl_r) while speaking
- Release to transcribe
- Your text appears wherever your cursor is!
Configuration
You can customize various aspects of VibeVoice with the following environment variables:
Keyboard Controls
VOICEKEY: Change the dictation activation key (default: "ctrl_r")export VOICEKEY="ctrl" # Use left control insteadVOICEKEY_CMD: Set the key for AI command mode (default: "scroll_lock")export VOICEKEY_CMD="ctsl" # Use left control instead of Scroll Lock key
AI and Screenshot Features
OLLAMA_MODEL: Specify which Ollama model to use (default: "gemma3:27b")export OLLAMA_MODEL="gemma3:4b" # Use a smaller VLM in case you have less GPU RAMINCLUDE_SCREENSHOT: Enable or disable screenshots in AI command mode (default: "true")export INCLUDE_SCREENSHOT="false" # Disable screenshots (but they are local only anyways)SCREENSHOT_MAX_WIDTH: Set the maximum width for screenshots (default: "1024")export SCREENSHOT_MAX_WIDTH="800" # Smaller screenshots
Screenshot Dependencies
To use the screenshot functionality:
sudo apt install gnome-screenshot
Usage Modes 💡
VibeVoice supports two modes:
1. Dictation Mode
- Hold down the dictation key (default: right Control)
- Speak your text
- Release to transcribe
- Your text appears wherever your cursor is!
2. AI Command Mode
- Hold down the command key (default: Scroll Lock)
- Ask a question or give a command
- Release the key
- The AI will analyze your request (and current screen if enabled) and type a response
Credits 🙏
- Original inspiration: whisper-keyboard by Vlad
- Faster Whisper for the optimized Whisper implementation
- Built by Marc Päpper
Related Skills
claude-opus-4-5-migration
83.0kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
336.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
mcp-for-beginners
15.6kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
TrendRadar
49.8k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
