`voice2text`

Local voice-to-text with Whisper + LLM cleanup. Push-to-talk (Right ⌘), pastes at cursor.

Voice-to-text tools like Wispr Flow, MacWhisper, and VoiceInk are becoming increasingly popular. It's a testament to our times that in 2025, ~270 lines of Python with local Whisper and a small ollama language model (Qwen 2.5-3B) can deliver a comparable experience on consumer hardware. Such tooling would have been unimaginable 3 years ago. This project is a proof of concept to demonstrate just that.

Note: Before anyone suggests splitting this into modules and submodules — this is an intentional design choice to demonstrate how this whole functionality fits in less than 300 lines of python code.

Note 2: This is macOS-only by design. We use:

mlx-whisper — optimized for Apple Silicon

osascript — for simulating Cmd+V paste via System Events

pbcopy/pbpaste — macOS clipboard

nowplaying-cli — macOS media control

System Preferences URLs for permissions

You're welcome to fork this and make it work on Linux or Windows!

Prerequisites

Skip this if using pixi — it handles ollama automatically.

brew install ollama
ollama pull qwen2.5:3b

Install

uvx (quick try)

The fastest way to try it out. Note: startup is slower because uvx creates a fresh virtual environment each time.

uvx --from voice2text v2t

Or from GitHub:

uvx --from git+https://github.com/lucharo/voice2text v2t

uv tool install (recommended for daily use)

Installs v2t as a persistent command — no virtual environment setup on each run, so startup is fast.

uv tool install voice2text
v2t

pip

pip install voice2text
v2t

Development install

git clone https://github.com/lucharo/voice2text.git
cd voice2text
uv sync
uv run v2t

Pixi

Pixi handles the ollama dependency automatically:

git clone https://github.com/lucharo/voice2text.git
cd voice2text
pixi run ollama pull qwen2.5:3b
pixi run v2t

Note: We don't publish to conda-forge/pixi channels yet, but may in the future.

Usage

v2t                      # strict mode (restructures sentences)
v2t --casual             # light cleanup (punctuation only)
v2t --pause-music        # pause media while recording (macOS only, requires nowplaying-cli via brew)

Hold Right Command to record, release to transcribe and paste.

Strict vs Casual Mode

| Raw transcription | Strict | Casual | |-------------------|--------|--------| | "Hey um I'll see you tomorrow at 9 actually no make it 10" | "Hey, I'll see you tomorrow at 10." | "Hey, I'll see you tomorrow at 9, actually no, make it 10." | | "So basically I was thinking we could um you know maybe try the other approach" | "I was thinking we could try the other approach." | "So basically, I was thinking we could maybe try the other approach." |

Strict (default): Removes filler words, restructures for clarity, condenses.

Casual: Only adds punctuation and removes "um/uh", keeps your phrasing.

`--pause-music` (macOS only)

Pauses any playing media while recording and resumes after. Requires:

brew install nowplaying-cli

Not available via pixi/conda-forge for now, maybe will publish later!

Voice2text

Install / Use

README

`voice2text`

Prerequisites

Install

uvx (quick try)

uv tool install (recommended for daily use)

pip

Development install

Pixi

Usage

Strict vs Casual Mode

`--pause-music` (macOS only)

Voice2text

Install / Use

README

voice2text

Prerequisites

Install

uvx (quick try)

uv tool install (recommended for daily use)

pip

Development install

Pixi

Usage

Strict vs Casual Mode

--pause-music (macOS only)

`voice2text`

`--pause-music` (macOS only)