IndicWav2Vec

IndicWav2Vec is a multilingual speech model pretrained on 40 Indian langauges. This model represents the largest diversity of Indian languages in the pool of multilingual speech models. We fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public benchmarks, namely MUCS, MSR and OpenSLR.

As part of IndicWav2Vec we create largest publicly available corpora for 40 languages from 4 different language families. We also trained state-of-the-art ASR models for 9 Indian languages.

IndicW2V

Benchmarks

We evaluate our models on 3 publicly available benchmarks MUCS, MSR and OpenSLR and below mentioned are our results

|Model | gu | ta | te | gu | hi | mr | or | ta | te | bn | ne | si | | ------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | |IndicW2V | 20.5 | 22.1 | 22.9 | 26.2 | 16.0 | 19.3 | 25.6 | 27.3 | 29.3 | 16.6 | 11.9 | 24.8 | |IndicW2V + LM| 11.7 | 13.6 | 11.0 | 17.2 | 14.7 | 13.8 | 17.2 | 25.0 | 20.5 | 13.6 | 13.6 | - |

Updates

21 June 2022

Added more documentation

IndicWav2Vec

Resources

Download Models

(* Trained on 40 Indian Languages, more details can be found here)

Hosted API Usage

Input API data format

{
    "config": {
        "language":{
          "sourceLanguage": "#Language Code"
        },
        "transcriptionFormat": {"value":"transcript"},
        "audioFormat": "wav"
    },
    "audio": [{
        "audioContent": "#BASE64 Encoded String"
    }]
}

OR

{
    "config": {
        "language":{
          "sourceLanguage": "#Language Code"
        },
        "transcriptionFormat": {"value":"transcript"},
        "audioFormat": "wav"
    },
    "audio": [{
        "audioUri": "#HTTP/GS path to file"
    }]
}

Output

{
    "output": [
        {
            "source": "सेकेंड स्टेप इस देसी है स्पेसिफाइड फॉरेस्ट राइट"
        }
    ],
    "status": "SUCCESS"
}

Accessing on ULCA

Our models can be directly accessed on ULCA by going into ASR section and filtering models by IndicWav2Vec.

App Screenshot

Quick start

Python Inference

Greedy Decoding

python sfi.py [--audio-file AUDIO_FILE_PATH] 
          [--ft-model FT_MODEL] 
          [--w2l-decoder viterbi]

KenLM Decoding

python sfi.py [--audio-file AUDIO_FILE_PATH]   
          [--ft-model FT_MODEL_PATH] 
          [--w2l-decoder kenlm] 
          [--lexicon LEXICON_PATH] 
          [--kenlm-model KENLM_MODEL_PATH]
          [--beam-threshold BEAM_THRESHOLD] 
          [--beam-size-token BEAM_SIZE_TOKEN] 
          [--beam BEAM_SIZE] 
          [--word-score WORD_SCORE] 
          [--lm-weight LM_WE

IndicWav2Vec

Install / Use

README

IndicWav2Vec

Benchmarks

Updates

Table of contents

Resources

Download Models

Hosted API Usage

Accessing on ULCA

Quick start

Python Inference