Vibravox
Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.
Install / Use
/learn @jhauret/VibravoxREADME
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/Python-3.12-3776AB?style=for-the-badge&logo=python&logoColor=white"></a> <a href="https://pytorch.org"><img alt="PyTorch" src="https://img.shields.io/badge/-Pytorch -ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white"></a> <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning -792ee5?style=for-the-badge&logo=lightning&logoColor=white"></a> <a href="https://hydra.cc/"><img alt="Config: hydra" src="https://img.shields.io/badge/-🐉 hydra -89b8cd?style=for-the-badge&logo=hydra&logoColor=white"></a> <a href="https://huggingface.co/datasets"><img alt="HuggingFace Datasets" src="https://img.shields.io/badge/datasets -yellow?style=for-the-badge&logo=huggingface&logoColor=white"></a>
Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.
</div>Resources:
- 📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
- 🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
- 🌐: For more information about the project, visit our project page.
- 🏆: Explore Leaderboards on Papers With Code.
Setup
Create your environment (example is given with pyenv here)
pyenv update
pyenv install 3.12.O
pyenv virtualenv 3.12.0 vibravox-env
pyenv local vibravox-env
Install vibravox
pip install vibravox
Available sensors
<p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/6390fc80e6d656eb421bab69/P-_IWM3IMED5RBS3Lhydc.png" width="500"> </p>- 🟣:
headset_microphone( Not available for Bandwidth Extension as it is the reference mic ) - 🟡:
throat_microphone - 🟢:
forehead_accelerometer - 🔵:
rigid_in_ear_microphone - 🔴:
soft_in_ear_microphone - 🧊:
temple_vibration_pickup
Run some models
(non explicit list of models, just examples)
-
EBEN or Mimi for Bandwidth Extension
- Train and test on
speech_clean, for recordings in a quiet environment:- For EBEN:
python run.py \ lightning_datamodule=bwe \ lightning_datamodule.sensor=throat_microphone \ lightning_module=eben \ lightning_module.generator.p=2 \ +callbacks=[bwe_checkpoint] \ ++trainer.check_val_every_n_epoch=15 \ ++trainer.max_epochs=500- For Mimi:
python run.py \ sample_rate=24000 \ lightning_datamodule=bwe \ lightning_datamodule.sensor=throat_microphone \ lightning_datamodule.batch_size=16 \ lightning_module=regressive_mimi \ lightning_module.optimizer.lr=1e-4 \ +callbacks=[bwe_checkpoint] \ - Train on
speech_cleanmixed withspeechless_noisyand test onspeech_noisy, for recordings in a noisy environment:- For EBEN with weights initialized from vibravox_EBEN_models:
python run.py \ lightning_datamodule=noisybwe \ lightning_datamodule.sensor=throat_microphone \ lightning_module=eben \ lightning_module.description=from_pretrained-throat_microphone \ ++lightning_module.generator=dummy \ ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \ ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \ ++lightning_module.discriminator=dummy \ ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \ ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \ +callbacks=[bwe_checkpoint] \ ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \ ++trainer.check_val_every_n_epoch=15 \ ++trainer.max_epochs=200
- Train and test on
-
wav2vec2 for Speech to Phoneme
- Train and test on
speech_clean, for recordings in a quiet environment: (weights initialized from facebook/wav2vec2-base-fr-voxpopuli )
python run.py \ lightning_datamodule=stp \ lightning_datamodule.sensor=throat_microphone \ lightning_module=wav2vec2_for_stp \ lightning_module.optimizer.lr=1e-5 \ ++trainer.max_epochs=10- Train and test on
speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )
python run.py \ lightning_datamodule=stp \ lightning_datamodule.sensor=throat_microphone \ lightning_datamodule.subset=speech_noisy \ lightning_datamodule/data_augmentation=aggressive \ lightning_module=wav2vec2_for_stp \ lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \ lightning_module.optimizer.lr=1e-6 \ ++trainer.max_epochs=30 - Train and test on
-
ECAPA2 for Speaker Verification:
- Test the model on
speech_clean:
python run.py \ lightning_datamodule=spkv \ lightning_module=ecapa2 \ logging=csv \ ++trainer.limit_train_batches=0 \ ++trainer.limit_val_batches=0- Test on
speech_cleanmixed withspeechless_noisy, representative ofspeech_noisywith the exact same pairs that were used onspeech_clean, allowing direct comparison of results:
python run.py \ lightning_datamodule=spkv \ lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \ lightning_datamodule.subset=speech_noisy_mixed \ lightning_module=ecapa2 \ logging=csv \ ++trainer.limit_train_batches=0 \ ++trainer.limit_val_batches=0 - Test the model on
Cite our work
If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :
@article{hauret2025vibravox,
title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
journal={Speech Communication},
pages={103238},
year={2025},
publisher={Elsevier}
}
and this HuggingFace repository, which is linked to a DOI :
@misc{cnamlmssc2024vibravoxdataset,
author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
Poir{\'e}e, Sarah and Bavu, {\'E}ric},
title = { {Vibravox} (Revision 7990b7d) },
year = 2024,
url = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
doi = { 10.57967/hf/2727 },
publisher = { Hugging Face }
}
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
