Vibravox

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

Generate Convert Improve

Install / Use

/learn @jhauret/Vibravox

About this skill

Quality Score

0/100

README

Speech to Phoneme, Bandwidth Extension and Speaker Verification using the Vibravox dataset.

</div>

Resources:

📝: The Open access paper published in Speech Communication related to this project is available on arXiv and Speech Communication
🤗: The dataset used in this project is hosted by Hugging Face. You can access it here.
🌐: For more information about the project, visit our project page.
🏆: Explore Leaderboards on Papers With Code.

Setup

Create your environment (example is given with `pyenv` here)

pyenv update
pyenv install 3.12.O
pyenv virtualenv 3.12.0 vibravox-env
pyenv local vibravox-env

Install `vibravox`

pip install vibravox

Available sensors

🟣:headset_microphone ( Not available for Bandwidth Extension as it is the reference mic )
🟡:throat_microphone
🟢:forehead_accelerometer
🔵:rigid_in_ear_microphone
🔴:soft_in_ear_microphone
🧊:temple_vibration_pickup

Run some models

(non explicit list of models, just examples)

EBEN or Mimi for Bandwidth Extension

Train and test on speech_clean, for recordings in a quiet environment:

For EBEN:

python run.py \
  lightning_datamodule=bwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=eben \
  lightning_module.generator.p=2 \
  +callbacks=[bwe_checkpoint] \
  ++trainer.check_val_every_n_epoch=15 \
  ++trainer.max_epochs=500

For Mimi:

python run.py \
  sample_rate=24000 \
  lightning_datamodule=bwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_datamodule.batch_size=16 \
  lightning_module=regressive_mimi \
  lightning_module.optimizer.lr=1e-4 \
  +callbacks=[bwe_checkpoint] \

Train on speech_clean mixed with speechless_noisy and test on speech_noisy, for recordings in a noisy environment:

For EBEN with weights initialized from vibravox_EBEN_models:

python run.py \
  lightning_datamodule=noisybwe \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=eben \
  lightning_module.description=from_pretrained-throat_microphone \
  ++lightning_module.generator=dummy \
  ++lightning_module.generator._target_=vibravox.torch_modules.dnn.eben_generator.EBENGenerator.from_pretrained \
  ++lightning_module.generator.pretrained_model_name_or_path=Cnam-LMSSC/EBEN_throat_microphone \
  ++lightning_module.discriminator=dummy \
  ++lightning_module.discriminator._target_=vibravox.torch_modules.dnn.eben_discriminator.DiscriminatorEBENMultiScales.from_pretrained \
  ++lightning_module.discriminator.pretrained_model_name_or_path=Cnam-LMSSC/DiscriminatorEBENMultiScales_throat_microphone \
  +callbacks=[bwe_checkpoint] \
  ++callbacks.checkpoint.monitor=validation/torchmetrics_stoi/synthetic \
  ++trainer.check_val_every_n_epoch=15 \
  ++trainer.max_epochs=200

wav2vec2 for Speech to Phoneme

Train and test on speech_clean, for recordings in a quiet environment: (weights initialized from facebook/wav2vec2-base-fr-voxpopuli )

python run.py \
  lightning_datamodule=stp \
  lightning_datamodule.sensor=throat_microphone \
  lightning_module=wav2vec2_for_stp \
  lightning_module.optimizer.lr=1e-5 \
  ++trainer.max_epochs=10

Train and test on speech_noisy, for recordings in a noisy environment: (weights initialized from vibravox_phonemizers )

python run.py \
lightning_datamodule=stp \
lightning_datamodule.sensor=throat_microphone \
lightning_datamodule.subset=speech_noisy \
lightning_datamodule/data_augmentation=aggressive \
lightning_module=wav2vec2_for_stp \
lightning_module.wav2vec2_for_ctc.pretrained_model_name_or_path=Cnam-LMSSC/phonemizer_throat_microphone \
lightning_module.optimizer.lr=1e-6 \
++trainer.max_epochs=30

ECAPA2 for Speaker Verification:

Test the model on speech_clean:

python run.py \
  lightning_datamodule=spkv \
  lightning_module=ecapa2 \
  logging=csv \
  ++trainer.limit_train_batches=0 \
  ++trainer.limit_val_batches=0

Test on speech_clean mixed with speechless_noisy, representative of speech_noisy with the exact same pairs that were used on speech_clean, allowing direct comparison of results:

python run.py \
  lightning_datamodule=spkv \
  lightning_datamodule.dataset_name=Cnam-LMSSC/vibravox_mixed_for_spkv \
  lightning_datamodule.subset=speech_noisy_mixed \
  lightning_module=ecapa2 \
  logging=csv \
  ++trainer.limit_train_batches=0 \
  ++trainer.limit_val_batches=0

Cite our work

If you use code in this repository or the Vibravox dataset (either curated or non-curated versions) for research, please cite this paper :

@article{hauret2025vibravox,
      title={{Vibravox: A dataset of french speech captured with body-conduction audio sensors}},
      author={{Hauret, Julien and Olivier, Malo and Joubaud, Thomas and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Zimpfer, V{\'e}ronique and Bavu, {\'E}ric},
      journal={Speech Communication},
      pages={103238},
      year={2025},
      publisher={Elsevier}
}

and this HuggingFace repository, which is linked to a DOI :

@misc{cnamlmssc2024vibravoxdataset,
    author={Hauret, Julien and Olivier, Malo and Langrenne, Christophe and
        Poir{\'e}e, Sarah and Bavu, {\'E}ric},
    title        = { {Vibravox} (Revision 7990b7d) },
    year         = 2024,
    url          = { https://huggingface.co/datasets/Cnam-LMSSC/vibravox },
    doi          = { 10.57967/hf/2727 },
    publisher    = { Hugging Face }
}

Related Skills

node-connect

339.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

339.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.9k

Commit, push, and open a PR

jhauret

View profile

View on GitHub

GitHub Stars49

CategoryDevelopment

Updated2d ago

Forks6

jhauret/vibravox

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings