SkillAgentSearch skills...

Vibravox

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Install / Use

/learn @jhauret/Vibravox

README

<div align="center"> <p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" style="object-fit:contain; width:250px; height:250px; border: solid 1px #CCC"> </p>

<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/Python-3.12-3776AB?style=for-the-badge&logo=python&logoColor=white"></a> <a href="https://pytorch.org"><img alt="PyTorch" src="https://img.shields.io/badge/-Pytorch -ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white"></a> <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning -792ee5?style=for-the-badge&logo=lightning&logoColor=white"></a> <a href="https://hydra.cc/"><img alt="Config: hydra" src="https://img.shields.io/badge/-🐉 hydra -89b8cd?style=for-the-badge&logo=hydra&logoColor=white"></a> <a href="https://huggingface.co/datasets"><img alt="HuggingFace Datasets" src="https://img.shields.io/badge/datasets -yellow?style=for-the-badge&logo=huggingface&logoColor=white"></a>

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

</div>

Resources:

  • 📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
  • 🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
  • 🌐: For more information about the project, visit our project page.
  • 🏆: Explore Leaderboards on Papers With Code.

Setup

Create your environment (example is given with pyenv here)

pyenv update
pyenv install 3.12.O
pyenv virtualenv 3.12.0 vibravox-env
pyenv local vibravox-env

Install vibravox

pip install vibravox

Available sensors

<p align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/6390fc80e6d656eb421bab69/P-_IWM3IMED5RBS3Lhydc.png" width="500"> </p>
  • 🟣:headset_microphone ( Not available for Bandwidth Extension as it is the reference mic )
  • 🟡:throat_microphone
  • 🟢:forehead_accelerometer
  • 🔵:rigid_in_ear_microphone
  • 🔴:soft_in_ear_microphone
  • 🧊:temple_vibration_pickup

Run some models

(non explicit list of models, just examples)

  • EBEN or Mimi for Bandwidth Extension

    • Train and test on speech_clean, for recordings in a quiet environment:
      • For EBEN:
      python run.py \
        lightning_datamodule=bwe \
        lightning_datamodule.sensor=throat_microphone \
        lightning_module=eben \
        lightning_module.generator.p=2 \
        +callbacks=[bwe_checkpoint] \
        ++trainer.check_val_every_n_epoch=15 \
        ++trainer.max_epochs=500
      
      • For Mimi:
      python run.py \
        sample_rate=24000 \
        lightning_datamodule=bwe \
        lightning_datamodule.sensor=throat_microphone \
        lightning_datamodule.batch_size=16 \
        lightning_module=regressive_mimi \
        lightning_module.optimizer.lr=1e-4 \
        +callbacks=[bwe_checkpoint] \
      
    • Train on speech_clean mixed with speechless_noisy and test on speech_noisy, for recordings in a noisy environment:
      python run.py \
        lightning_datamodule=noisybwe \
        lightning_datamodule.sensor=throat_microphone \
        lightning_module=eben \
        lightning_module.description=from_pretrained-throat_microphone \
        ++lightning_module.generator=dummy \
        ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \
        ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \
        ++lightning_module.discriminator=dummy \
        ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \
        ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \
        +callbacks=[bwe_checkpoint] \
        ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \
        ++trainer.check_val_every_n_epoch=15 \
        ++trainer.max_epochs=200
      
  • wav2vec2 for Speech to Phoneme

    python run.py \
      lightning_datamodule=stp \
      lightning_datamodule.sensor=throat_microphone \
      lightning_module=wav2vec2_for_stp \
      lightning_module.optimizer.lr=1e-5 \
      ++trainer.max_epochs=10
    
    • Train and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )
    python run.py \
    lightning_datamodule=stp \
    lightning_datamodule.sensor=throat_microphone \
    lightning_datamodule.subset=speech_noisy \
    lightning_datamodule/data_augmentation=aggressive \
    lightning_module=wav2vec2_for_stp \
    lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \
    lightning_module.optimizer.lr=1e-6 \
    ++trainer.max_epochs=30
    
  • ECAPA2 for Speaker Verification:

    • Test the model on speech_clean:
    python run.py \
      lightning_datamodule=spkv \
      lightning_module=ecapa2 \
      logging=csv \
      ++trainer.limit_train_batches=0 \
      ++trainer.limit_val_batches=0
    
    • Test on speech_clean mixed with speechless_noisy, representative of speech_noisy with the exact same pairs that were used on speech_clean, allowing direct comparison of results:
    python run.py \
      lightning_datamodule=spkv \
      lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \
      lightning_datamodule.subset=speech_noisy_mixed \
      lightning_module=ecapa2 \
      logging=csv \
      ++trainer.limit_train_batches=0 \
      ++trainer.limit_val_batches=0
    

Cite our work

If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :

@article{hauret2025vibravox,
      title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
      author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
      journal={Speech Communication},
      pages={103238},
      year={2025},
      publisher={Elsevier}
}

and this HuggingFace repository, which is linked to a DOI :

@misc{cnamlmssc2024vibravoxdataset,
    author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Bavu, {\'E}ric},
    title        = { {Vibravox} (Revision 7990b7d) },
    year         = 2024,
    url          = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
    doi          = { 10.57967/hf/2727 },
    publisher    = { Hugging Face }
}

Related Skills

View on GitHub
GitHub Stars49
CategoryDevelopment
Updated2d ago
Forks6

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings