Speechbrain
A PyTorch-based Speech Toolkit
Install / Use
/learn @speechbrain/SpeechbrainQuality Score
Category
Content & MediaSupported Platforms
Tags
README
| 📘 Tutorials | 🌐 Website | 📚 Documentation | 🤝 Contributing | 🤗 HuggingFace | ▶️ YouTube | 🐦 X |
Please, help our community project. Star on GitHub!
Exciting News (January, 2024): Discover what is new in SpeechBrain 1.0 here!
🗣️💬 What SpeechBrain Offers
-
SpeechBrain is an open-source PyTorch toolkit that accelerates Conversational AI development, i.e., the technology behind speech assistants, chatbots, and large language models.
-
It is crafted for fast and easy creation of advanced technologies for Speech and Text Processing.
🌐 Vision
-
With the rise of deep learning, once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.
-
We think it is now time for a holistic toolkit that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.
-
This spans speech recognition, speaker recognition, speech enhancement, speech separation, language modeling, dialogue, and beyond.
-
Aligned with our long-term goal of natural human-machine conversation, including for non-verbal individuals, we have recently added support for the EEG modality.
📚 Training Recipes
-
We share over 200 competitive training recipes on more than 40 datasets supporting 20 speech and text processing tasks (see below).
-
We support both training from scratch and fine-tuning pretrained models such as Whisper, Wav2Vec2, WavLM, Hubert, GPT2, Llama2, and beyond. The models on HuggingFace can be easily plugged in and fine-tuned.
-
For any task, you train the model using these commands:
python train.py hparams/train.yaml
-
The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.
-
We maintained a consistent code structure across different tasks.
-
For better replicability, training logs and checkpoints are hosted on Dropbox.
<a href="https://huggingface.co/speechbrain" target="_blank"> <img src="https://huggingface.co/front/assets/huggingface_logo.svg" alt="drawing" width="40"/> </a> Pretrained Models and Inference
- Access over 100 pretrained models hosted on HuggingFace.
- Each model comes with a user-friendly interface for seamless inference. For example, transcribing speech using a pretrained model requires just three lines of code:
from speechbrain.inference import EncoderDecoderASR
asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-conformer-transformerlm-librispeech", savedir="pretrained_models/asr-transformer-transformerlm-librispeech")
asr_model.transcribe_file("speechbrain/asr-conformer-transformerlm-librispeech/example.wav")
<a href="https://speechbrain.github.io/" target="_blank"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Google_Colaboratory_SVG_Logo.svg/1200px-Google_Colaboratory_SVG_Logo.svg.png" alt="drawing" width="50"/> </a> Documentation
- We are deeply dedicated to promoting inclusivity and education.
- We have authored over 30 tutorials that not only describe how SpeechBrain works but also help users familiarize themselves with Conversational AI.
- Every class or function has clear explanations and examples that you can run. Check out the documentation for more details 📚.
🎯 Use Cases
-
🚀 Research Acceleration: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.
-
⚡️ Rapid Prototyping: Ideal for quick prototyping in time-sensitive projects.
-
🎓 Educational Tool: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like Mila, Concordia University, Avignon University, and many others for student training.
🚀 Quick Start
To get started with SpeechBrain, follow these simple steps:
🛠️ Installation
Install via PyPI
-
Install SpeechBrain using PyPI:
pip install speechbrain -
Access SpeechBrain in your Python code:
import speechbrain as sb
Install from GitHub
This installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.
-
Clone the GitHub repository and install the requirements:
git clone https://github.com/speechbrain/speechbrain.git cd speechbrain pip install -r requirements.txt pip install --editable . -
Access SpeechBrain in your Python code:
import speechbrain as sb
Any modifications made to the speechbrain package will be automatically reflected, thanks to the --editable flag.
✔️ Test Installation
Ensure your installation is correct by running the following commands:
pytest tests
pytest --doctest-modules speechbrain
🏃♂️ Running an Experiment
In SpeechBrain, you can train a model for any task using the following steps:
cd recipes/<dataset>/<task>/
python experiment.py params.yaml
The results will be saved in the output_folder specified in the YAML file.
📘 Learning SpeechBrain
-
Website: Explore general information on the official website.
-
Tutorials: Start with basic tutorials covering fundamental functionalities. Find advanced tutorials and topics in the Tutorial notebooks category in the SpeechBrain documentation.
-
Documentation: Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the documentation.
🔧 Supported Technologies
- SpeechBrain is a versatile framework designed for implementing a wide range of technologies within the field of Conversational AI.
- It excels not only in individual task implementations but also in combining various technologies into complex pipelines.
🎙️ Speech/Audio Processing
| Tasks | Datasets | Technologies/Models | | ------------- |-------------| -----| | Speech Recognition | AISHELL-1, CommonVoice, DVoice, LibriSpeech, MEDIA, RescueSpeech, Switchboard, TIMIT, Tedlium2, Voicebank | CTC, Transducers, Transformers, Seq2Seq, Beamsearch techniques for CTC,seq2seq,transducers), Rescoring, Conformer, Branchformer, Hyperconformer, Kaldi2-FST | | Speaker Recognition | VoxCeleb | ECAPA-TDNN, ResNET, Xvectors, PLDA, Score Normalization | | Speech Separation | WSJ0Mix, LibriMix, WHAM!, WHAMR!, Aishell1Mix, BinauralWSJ0Mix | SepFormer, RESepFormer, SkiM, DualPath RNN, ConvTasNET | | Speech Enhancement | DNS, Voicebank | SepFormer, MetricGAN, MetricGAN-U, [SEGAN](https://arxiv.org/abs/1703.0
