SkillAgentSearch skills...

IndicWav2Vec

Pretraining, fine-tuning and evaluation scripts for Indic-Wav2Vec2

Install / Use

/learn @AI4Bharat/IndicWav2Vec
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

IndicWav2Vec

IndicWav2Vec is a multilingual speech model pretrained on 40 Indian langauges. This model represents the largest diversity of Indian languages in the pool of multilingual speech models. We fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public benchmarks, namely MUCS, MSR and OpenSLR.

As part of IndicWav2Vec we create largest publicly available corpora for 40 languages from 4 different language families. We also trained state-of-the-art ASR models for 9 Indian languages.

IndicW2V

Benchmarks

We evaluate our models on 3 publicly available benchmarks MUCS, MSR and OpenSLR and below mentioned are our results

|Model | gu | ta | te | gu | hi | mr | or | ta | te | bn | ne | si | | ------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | |IndicW2V | 20.5 | 22.1 | 22.9 | 26.2 | 16.0 | 19.3 | 25.6 | 27.3 | 29.3 | 16.6 | 11.9 | 24.8 | |IndicW2V + LM| 11.7 | 13.6 | 11.0 | 17.2 | 14.7 | 13.8 | 17.2 | 25.0 | 20.5 | 13.6 | 13.6 | - |

Updates

21 June 2022

Added more documentation

Table of contents

Resources

Download Models

|Language |Acoustic Model | Dictionary | Language Model | Lexicon | Wandb | | - | - | - | - | - | - | | Bengali | fairseq | [hf]| link | KenLM | link | link | | Gujarati | fairseq / hf | link | KenLM | link | link | | Hindi | fairseq / hf | link | KenLM | link | link | | Marathi | fairseq / hf | link | KenLM | link | link | | Nepali | fairseq / hf | link | KenLM | link | link | | Odia | fairseq / hf | link | KenLM | link | link | | Tamil | fairseq / hf | link | KenLM | link | link | | Telugu | fairseq / hf | link | KenLM | link | link | | Sinhala | fairseq / hf | link | KenLM | link | link | | Kannada (KB) | fairseq / hf | link | KenLM | link | link | | Malayalam (KB) | fairseq / hf | link | KenLM | link | link |

Pretrained Model(*) |Name |Model Checkpoint | | - | - |
| IndicWav2Vec Large | fairseq | | IndicWav2Vec Base | fairseq |

(* Trained on 40 Indian Languages, more details can be found here)

Hosted API Usage

Our models are hosted at the following API end points. | Langugage| Language Code | API End point | | - | - | - | | Bengali | bn | coming soon - will be back shortly | | Gujarati | gu | coming soon - will be back shortly | | Hindi | hi | https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/hi | | Marathi | mr| https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/mr | | Nepali | ne| coming soon - will be back shortly | | Odia | or| coming soon - will be back shortly | | Tamil | ta| https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/ta | | Telugu | te| https://ai4b-dev-asr.ulcacontrib.org/asr/v1/recognize/te | | Sinhala | si| coming soon - will be back shortly |

Input API data format

{
    "config": {
        "language":{
          "sourceLanguage": "#Language Code"
        },
        "transcriptionFormat": {"value":"transcript"},
        "audioFormat": "wav"
    },
    "audio": [{
        "audioContent": "#BASE64 Encoded String"
    }]
}

OR

{
    "config": {
        "language":{
          "sourceLanguage": "#Language Code"
        },
        "transcriptionFormat": {"value":"transcript"},
        "audioFormat": "wav"
    },
    "audio": [{
        "audioUri": "#HTTP/GS path to file"
    }]
}

Output

{
    "output": [
        {
            "source": "सेकेंड स्टेप इस देसी है स्पेसिफाइड फॉरेस्ट राइट"
        }
    ],
    "status": "SUCCESS"
}

Accessing on ULCA

Our models can be directly accessed on ULCA by going into ASR section and filtering models by IndicWav2Vec.

App Screenshot

Quick start

Python Inference

  • Greedy Decoding
    python sfi.py [--audio-file AUDIO_FILE_PATH] 
              [--ft-model FT_MODEL] 
              [--w2l-decoder viterbi]
    
  • KenLM Decoding
    python sfi.py [--audio-file AUDIO_FILE_PATH]   
              [--ft-model FT_MODEL_PATH] 
              [--w2l-decoder kenlm] 
              [--lexicon LEXICON_PATH] 
              [--kenlm-model KENLM_MODEL_PATH]
              [--beam-threshold BEAM_THRESHOLD] 
              [--beam-size-token BEAM_SIZE_TOKEN] 
              [--beam BEAM_SIZE] 
              [--word-score WORD_SCORE] 
              [--lm-weight LM_WE
    
View on GitHub
GitHub Stars111
CategoryDevelopment
Updated2d ago
Forks35

Languages

Jupyter Notebook

Security Score

95/100

Audited on Apr 4, 2026

No findings