Supervoxtral
A simple Python CLI/GUI tool to record audio from your microphone, optionally convert it (WAV/MP3/Opus), and send it to Mistral Voxtral transcription/chat APIs.
Install / Use
/learn @vlebert/SupervoxtralREADME
supervoxtral
GUI:

CLI:

SuperVoxtral is a lightweight Python CLI/GUI utility for recording audio and processing it via a 2-step pipeline using Mistral's APIs.
The pipeline works in two stages:
- (1) Transcription — audio is converted to text using Voxtral's dedicated transcription endpoint (
voxtral-mini-latest), which delivers fast inference, high accuracy across languages, and minimal API costs; - (2) Transformation — the raw transcript is refined by a text-based LLM (e.g.,
mistral-small-latest) using a configurable prompt for tasks like error correction, summarization, or reformatting.
In pure transcription mode (--transcribe), only step 1 is performed.
Key features:
- Process existing files — feed any audio or video file (WAV, MP3, M4A, FLAC, Opus, OGG, MP4, MOV, MKV, AVI, WebM) through the pipeline with
svx process <file>. No recording needed — ideal for workflows using a screen recorder like CleanShot X to capture mic + system audio simultaneously, then processing with SuperVoxtral. Simpler than BlackHole loopback setups. - Speaker diarization — identifies who said what (enabled by default)
- Auto-chunking — long recordings (> 5 min) are automatically split, transcribed in parallel, and merged without duplicates
- Dual audio capture — records microphone + system audio (e.g., meeting participants on a call) when a loopback device is configured
- Meeting-ready — long recordings auto-save all files for data protection; use any prompt for meeting summaries, action items, etc.
For instance, use a prompt like: "Transcribe this audio precisely and remove all minor speech hesitations: "um", "uh", "er", "euh", "ben", etc."
The GUI is minimal, launches fast, and can be bound to a system hotkey. Upon stopping recording, it transcribes via the pipeline and copies the result directly to the system clipboard, enabling efficient voice-driven workflows: e.g., dictating code snippets into an IDE or prompting LLMs via audio without typing. Real-time segmented level meters (MIC, and LOOP when a loopback device is configured) give immediate feedback on audio signal, so you can confirm sound is being captured before committing to a recording.

Platform note: SuperVoxtral has been tested on macOS only at this stage. It should work on Linux and Windows but hasn't been validated — feedback welcome via GitHub Issues.
Requirements
- Python 3.11+
- tkinter (GUI): part of the Python standard library, but not always bundled — see the installation notes below.
- ffmpeg (for MP3/Opus conversions)
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg - Windows: https://ffmpeg.org/download.html
- macOS:
Installation
Recommended: uv
uv is the recommended way to install SuperVoxtral. It manages its own Python distribution (which includes tkinter on macOS), avoiding common setup issues. Install it first if needed:
curl -LsSf https://astral.sh/uv/install.sh | sh
Then install SuperVoxtral as a global tool:
uv tool install supervoxtral
# to update: uv tool upgrade supervoxtral
tkinter availability: The GUI uses Python's built-in
tkinterlibrary. Usinguv(recommended above) handles this automatically via its bundled Python. Platform-specific notes:
- macOS: The system Python (
/usr/bin/python3) and some Homebrew Pythons do not include tkinter. If you use Homebrew Python:brew install python-tk@3.x.- Ubuntu/Debian Linux: tkinter is a separate system package — install it with
sudo apt-get install python3-tk.- Windows: tkinter is included in the official Python installer from python.org; no extra steps needed.
Alternative: pip
If you prefer pip in a virtual environment:
-
Create and activate a virtual environment:
-
macOS/Linux:
python -m venv .venv source .venv/bin/activate -
Windows (PowerShell):
python -m venv .venv .\.venv\Scripts\Activate.ps1
-
-
Install the package:
pip install supervoxtralMake sure the Python used includes tkinter (see the tkinter availability note above).
Development
- Clone the repo and navigate to the project root.
- Install dependencies (creates
.venvautomatically, editable mode, lockfile-based):uv sync --extra devtkinter (needed for
--gui) is stdlib but not always bundled. Ifsvx record --guifails with a tkinter error, see the tkinter availability note above. - Run linting and type checking:
uv run ruff check svx/ uv run basedpyright svx
Quick Start
-
Initialize the configuration:
svx config initThis creates the defaultconfig.tomlfile with zero-footprint settings. -
Open the configuration directory:
svx config openEditconfig.tomland add your Mistral API key under the[providers.mistral]section:[providers.mistral] api_key = "your_mistral_api_key_here" -
Launch the GUI:
svx record --guiThis opens the minimal GUI and starts recording immediately. Real-time level meters (MIC / LOOP) confirm that audio is being captured. Click Transcribe for pure transcription (no prompt) or a button for each configured prompt (e.g., Default, Mail, Translate) for prompted transcription; results are copied to the clipboard automatically.
See the Configuration Reference for the full configuration reference.
macOS Shortcuts Integration
To enable fast, hotkey-driven access on macOS, integrate SuperVoxtral with the Shortcuts app. Create a new Shortcut that runs svx record --gui via a "Run Shell Script" action (ensure svx is in your PATH). Assign a global hotkey in Shortcuts settings for instant GUI launch — ideal for quick voice-to-text workflows, with results copied directly to the clipboard.
Quick Setup Steps
- Open the Shortcuts app and create a new shortcut.
- Add the "Run Shell Script" action with input:
svx record --gui. - In shortcut details, set a keyboard shortcut (e.g., Cmd+Shift+V).
Usage (CLI)
The CLI provides config utilities and a unified record entrypoint for both CLI and GUI modes, using a centralized pipeline for consistent behavior (recording, conversion, transcription, saving, clipboard copy, logging).
Zero-footprint defaults: No directories created; outputs to console/clipboard. Use --save-all or set keep_* = true in config.toml to persist files to user data directories (e.g., ~/Library/Application Support/SuperVoxtral/ on macOS). Long recordings (> chunk_duration) automatically enable persistence for data protection.
Most defaults (provider, format, model, language, device, keep flags, copy) come from config.toml. CLI overrides are limited to specific options.
Record Command
svx record [OPTIONS]
Options:
--user-prompt TEXT(or--prompt TEXT): Inline user prompt for this run.--user-prompt-file PATH(or--prompt-file PATH): Path to a markdown file with the user prompt.--transcribe: Enable pure transcription mode (ignores prompts; uses dedicated endpoint).--outfile-prefix PREFIX: Custom prefix for output files (default: timestamp).--gui: Launch the GUI frontend. Recording starts immediately; real-time level meters (MIC / LOOP) confirm signal. Buttons: Transcribe (pure transcription, no prompt) or one button per configured prompt key (e.g., Default). Respects config.toml and other CLI flags (e.g.,--save-all).--transcribeis ignored with a warning in GUI mode.--save-all: Override config to keep audio, transcripts, and logs for this run.--log-level LEVEL: Set logging level (DEBUG, INFO, WARNING, ERROR; default: INFO).
Examples:
- Record with prompt:
svx record --prompt "What's in this audio?" - Persist outputs:
svx record --save-all --prompt "Summarize this" - Transcribe only:
svx record --transcribe - Launch GUI:
svx record --gui
Process Command
Feed an existing audio or video file through the same pipeline — no recording needed.
svx process AUDIO_FILE [OPTIONS]
Supported formats: WAV, MP3, M4A, FLAC, Opus, OGG, MP4, MOV, MKV, AVI, WebM.
The original file is never deleted, regardless of keep_* config flags.
Options (same as record, minus --gui and --outfile-prefix):
--transcribe: Pure transcription mode (no prompt).--save-all: Save converted audio and transcripts to user data directories.--user-prompt TEXT/--user-prompt-file PATH: Inline or file-based prompt.--log-level LEVEL: Logging level.
Examples:
- Transcribe a file:
svx process recording.m4a --transcribe - Summarize a meeting recording:
svx process meeting.mp4 --prompt "Summarize in bullet points" - Save outputs:
svx process interview.wav --save-all
Typical workflow with a screen recorder (e.g., CleanShot X):
Use your screen recorder to capture audio — it records mic + system audio together in a single file. Then run
svx processon that file. This is a simpler alternative to the BlackHole loopback setup for meeting transcription.
Prompt Resolution Priority (non-transcribe mode)
By default in CLI, uses the 'default' prompt from config.toml [prompt.default]. For overrides:
- CLI
--user-promptor--user-prompt-file - config.toml
[prompt.default](text or file) - User prompt file (
user.mdin config dir) - Fallback: "What's in this audio?"
Changelog
- 0.10.0: Migrate to mistralai SDK v2 — updates the import path from
mistralaitomistralai.client(namespace package restructuring in v2). No API behavior changes; all method signatures remain identical. - 0.9.1: Fi
