Voxtral.cpp
Port of Mistral's Voxtral model in C/C++
Install / Use
/learn @andrijdavid/Voxtral.cppREADME
Voxtral.cpp
A ggml-based C++ implementation of Voxtral Realtime 4B.
Voxtral References
- Official Mistral Voxtral announcement: https://mistral.ai/news/voxtral
- Mistral Audio & Transcription docs: https://docs.mistral.ai/capabilities/audio_transcription
- Mistral Audio Transcriptions API: https://docs.mistral.ai/api/endpoint/audio/transcriptions
Model Weights (GGUF)
Quantized GGUF weights used by this project are hosted on Hugging Face:
- https://huggingface.co/andrijdavid/Voxtral-Mini-4B-Realtime-2602-GGUF
The download_model.sh script downloads from that repository.
Quickstart
1. Download the model
Download the pre-converted GGUF model from Hugging Face:
# Default: Q4_0 quantization
./tools/download_model.sh Q4_0
2. Build
Build the project using CMake:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
3. Audio Preparation
The model expects 16-bit PCM WAV files at 16kHz (mono). You can use ffmpeg to convert your audio files:
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
4. Run Inference
./build/voxtral \
--model models/voxtral/Q4_0.gguf \
--audio path/to/input.wav \
--threads 8
Advanced Usage
Manual Quantization
You can quantize an existing GGUF file using the native quantizer:
./build/voxtral-quantize \
models/voxtral/voxtral.gguf \
models/voxtral/voxtral-q6_k.gguf \
Q6_K \
8
voxtral-quantize
Command format:
./build/voxtral-quantize <input.gguf> <output.gguf> <type> [nthreads]
Examples:
# 1) Quantize to Q4_0 using default thread count
./build/voxtral-quantize models/voxtral/voxtral.gguf models/voxtral/Q4_0.gguf Q4_0
# 2) Quantize to Q6_K using 8 threads
./build/voxtral-quantize models/voxtral/voxtral.gguf models/voxtral/Q6_K.gguf Q6_K 8
Supported type values:
Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q4_K_M
Notes:
- Input must be a Voxtral GGUF (
general.architecture = voxtral_realtime). Q4_K_Muses a mixed strategy internally (some tensors kept at higher precision).nthreadsis optional; when omitted, hardware concurrency is used.
Testing
The test suite runs over samples/*.wav files.
Numeric Parity Check
To verify numeric parity against the reference implementation:
python3 tests/test_voxtral_reference.py
Custom Tolerances
You can override comparison tolerances via environment variables:
VOXTRAL_TEST_ATOL(default: 1e-2)VOXTRAL_TEST_RTOL(default: 1e-2)VOXTRAL_TEST_THREADS
