Voxmlx
Realtime Transcription with Voxtral in MLX
Install / Use
/learn @awni/VoxmlxREADME
voxmlx
Realtime speech-to-text with Voxtral Mini Realtime in MLX.
Install
pip install voxmlx
Usage
voxmlx
Transcribe audio from a file or stream from the microphone in real-time.
Stream from microphone:
voxmlx
Transcribe a file:
voxmlx --audio audio.flac
Options:
| Flag | Description | Default |
|------|-------------|---------|
| --audio | Path to audio file (omit to stream from mic) | None |
| --model | Model path or HuggingFace model ID | mlx-community/Voxtral-Mini-4B-Realtime-6bit |
| --temp | Sampling temperature (0 = greedy) | 0.0 |
voxmlx-convert
Convert Voxtral weights to voxmlx/MLX format with optional quantization.
Basic conversion:
voxmlx-convert --mlx-path voxtral-mlx
4-bit quantized conversion:
voxmlx-convert -q --mlx-path voxtral-mlx-4bit
Convert and upload to HuggingFace:
voxmlx-convert -q --mlx-path voxtral-mlx-4bit --upload-repo username/voxtral-mlx-4bit
Options:
| Flag | Description | Default |
|------|-------------|---------|
| --hf-path | HuggingFace model ID or local path | mistralai/Voxtral-Mini-4B-Realtime-2602 |
| --mlx-path | Output directory | mlx_model |
| -q, --quantize | Quantize the model | Off |
| --group-size | Quantization group size | 64 |
| --bits | Bits per weight | 4 |
| --dtype | Cast weights (float16, bfloat16, float32) | None |
| --upload-repo | HuggingFace repo to upload converted model | None |
Python API
from voxmlx import transcribe
text = transcribe("audio.flac")
print(text)
