IndicWav2Vec
Pretraining, fine-tuning and evaluation scripts for Indic-Wav2Vec2
Install / Use
/learn @AI4Bharat/IndicWav2VecREADME
IndicWav2Vec
IndicWav2Vec is a multilingual speech model pretrained on 40 Indian langauges. This model represents the largest diversity of Indian languages in the pool of multilingual speech models. We fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public benchmarks, namely MUCS, MSR and OpenSLR.
As part of IndicWav2Vec we create largest publicly available corpora for 40 languages from 4 different language families. We also trained state-of-the-art ASR models for 9 Indian languages.

Benchmarks
We evaluate our models on 3 publicly available benchmarks MUCS, MSR and OpenSLR and below mentioned are our results
|Model | gu | ta | te | gu | hi | mr | or | ta | te | bn | ne | si | | ------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | |IndicW2V | 20.5 | 22.1 | 22.9 | 26.2 | 16.0 | 19.3 | 25.6 | 27.3 | 29.3 | 16.6 | 11.9 | 24.8 | |IndicW2V + LM| 11.7 | 13.6 | 11.0 | 17.2 | 14.7 | 13.8 | 17.2 | 25.0 | 20.5 | 13.6 | 13.6 | - |
Updates
21 June 2022
Added more documentation
Table of contents
- IndicWav2Vec
Resources
Download Models
|Language |Acoustic Model | Dictionary | Language Model | Lexicon | Wandb | | - | - | - | - | - | - | | Bengali | fairseq | [hf]| link | KenLM | link | link | | Gujarati | fairseq / hf | link | KenLM | link | link | | Hindi | fairseq / hf | link | KenLM | link | link | | Marathi | fairseq / hf | link | KenLM | link | link | | Nepali | fairseq / hf | link | KenLM | link | link | | Odia | fairseq / hf | link | KenLM | link | link | | Tamil | fairseq / hf | link | KenLM | link | link | | Telugu | fairseq / hf | link | KenLM | link | link | | Sinhala | fairseq / hf | link | KenLM | link | link | | Kannada (KB) | fairseq / hf | link | KenLM | link | link | | Malayalam (KB) | fairseq / hf | link | KenLM | link | link |
Pretrained Model(*)
|Name |Model Checkpoint |
| - | - |
| IndicWav2Vec Large | fairseq |
| IndicWav2Vec Base | fairseq |
(* Trained on 40 Indian Languages, more details can be found here)
Hosted API Usage
Our models are hosted at the following API end points. | Langugage| Language Code | API End point | | - | - | - | | Bengali | bn | coming soon - will be back shortly | | Gujarati | gu | coming soon - will be back shortly | | Hindi | hi | https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/hi | | Marathi | mr| https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/mr | | Nepali | ne| coming soon - will be back shortly | | Odia | or| coming soon - will be back shortly | | Tamil | ta| https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/ta | | Telugu | te| https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/te | | Sinhala | si| coming soon - will be back shortly |
Input API data format
{
"config": {
"language":{
"sourceLanguage": "#Language Code"
},
"transcriptionFormat": {"value":"transcript"},
"audioFormat": "wav"
},
"audio": [{
"audioContent": "#BASE64 Encoded String"
}]
}
OR
{
"config": {
"language":{
"sourceLanguage": "#Language Code"
},
"transcriptionFormat": {"value":"transcript"},
"audioFormat": "wav"
},
"audio": [{
"audioUri": "#HTTP/GS path to file"
}]
}
Output
{
"output": [
{
"source": "सेकेंड स्टेप इस देसी है स्पेसिफाइड फॉरेस्ट राइट"
}
],
"status": "SUCCESS"
}
Accessing on ULCA
Our models can be directly accessed on ULCA by going into ASR section and filtering models by IndicWav2Vec.

Quick start
Python Inference
- Greedy Decoding
python sfi.py [--audio-file AUDIO_FILE_PATH] [--ft-model FT_MODEL] [--w2l-decoder viterbi] - KenLM Decoding
python sfi.py [--audio-file AUDIO_FILE_PATH] [--ft-model FT_MODEL_PATH] [--w2l-decoder kenlm] [--lexicon LEXICON_PATH] [--kenlm-model KENLM_MODEL_PATH] [--beam-threshold BEAM_THRESHOLD] [--beam-size-token BEAM_SIZE_TOKEN] [--beam BEAM_SIZE] [--word-score WORD_SCORE] [--lm-weight LM_WE
