voxmlx

Realtime speech-to-text with Voxtral Mini Realtime in MLX.

Install

pip install voxmlx

Usage

`voxmlx`

Transcribe audio from a file or stream from the microphone in real-time.

Stream from microphone:

voxmlx

Transcribe a file:

voxmlx --audio audio.flac

Options:

| Flag | Description | Default | |------|-------------|---------| | --audio | Path to audio file (omit to stream from mic) | None | | --model | Model path or HuggingFace model ID | mlx-community/Voxtral-Mini-4B-Realtime-6bit | | --temp | Sampling temperature (0 = greedy) | 0.0 |

`voxmlx-convert`

Convert Voxtral weights to voxmlx/MLX format with optional quantization.

Basic conversion:

voxmlx-convert --mlx-path voxtral-mlx

4-bit quantized conversion:

voxmlx-convert -q --mlx-path voxtral-mlx-4bit

Convert and upload to HuggingFace:

voxmlx-convert -q --mlx-path voxtral-mlx-4bit --upload-repo username/voxtral-mlx-4bit

Options:

| Flag | Description | Default | |------|-------------|---------| | --hf-path | HuggingFace model ID or local path | mistralai/Voxtral-Mini-4B-Realtime-2602 | | --mlx-path | Output directory | mlx_model | | -q, --quantize | Quantize the model | Off | | --group-size | Quantization group size | 64 | | --bits | Bits per weight | 4 | | --dtype | Cast weights (float16, bfloat16, float32) | None | | --upload-repo | HuggingFace repo to upload converted model | None |

Python API

from voxmlx import transcribe

text = transcribe("audio.flac")
print(text)

Voxmlx

Install / Use

README