SkillAgentSearch skills...

IndicTrans

indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2

Install / Use

/learn @AI4Bharat/IndicTrans
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1><b><i>IndicTrans</i></b></h1> <a href="http://indicnlp.ai4bharat.org/samanantar">Website</a> | <a href="https://arxiv.org/abs/2104.05596">Paper</a> | <a href="https://youtu.be/QwYPOd1eBtQ?t=383">Video</a> | <a href="https://github.com/AI4Bharat/indicTrans/tree/main/AI4B_Demo">Demo Resources</a> <br><br> </div>

🚩NOTE 🚩IndicTrans2 is now available. It supports 22 Indian languages and has better translation quality compared to IndicTrans1. We recommend using IndicTrans2.

IndicTrans is a Transformer-4x ( ~434M ) multilingual NMT model trained on Samanantar dataset which is the largest publicly available parallel corpora collection for Indic languages at the time of writing ( 14 April 2021 ). It is a single script model i.e we convert all the Indic data to the Devanagari script which allows for better lexical sharing between languages for transfer learning, prevents fragmentation of the subword vocabulary between Indic languages and allows using a smaller subword vocabulary. We currently release two models - Indic to English and English to Indic and support the following 11 indic languages:

| <!-- --> | <!-- --> | <!-- --> | <!-- --> | | ------------- | -------------- | ------------ | ----------- | | Assamese (as) | Hindi (hi) | Marathi (mr) | Tamil (ta) | | Bengali (bn) | Kannada (kn) | Odia (or) | Telugu (te) | | Gujarati (gu) | Malayalam (ml) | Punjabi (pa) |

Benchmarks

We evaluate IndicTrans model on a WAT2021, WAT2020, WMT (2014, 2019, 2020), UFAL, PMI (subset of the PMIndia dataest created by us for Assamese) and FLORES benchmarks. It outperforms all publicly available open source models. It also outperforms commercial systems like Google, Bing translate on most datasets and performs competitively on Flores. Here are the results that we obtain:

<!-- <style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-9wq8{border-color:inherit;text-align:center;vertical-align:middle} </style> --> <table class="tg"> <thead> <tr> <th class="tg-9wq8"></th> <th class="tg-9wq8" colspan="10">WAT2021</th> <th class="tg-9wq8" colspan="7">WAT2020</th> <th class="tg-9wq8" colspan="3">WMT</th> <th class="tg-9wq8">UFAL</th> <th class="tg-9wq8">PMI</th> <th class="tg-9wq8" colspan="11">FLORES-101</th> </tr> </thead> <tbody> <tr> <td class="tg-9wq8"></td> <td class="tg-9wq8">bn</td> <td class="tg-9wq8">gu</td> <td class="tg-9wq8">hi</td> <td class="tg-9wq8">kn</td> <td class="tg-9wq8">ml</td> <td class="tg-9wq8">mr</td> <td class="tg-9wq8">or</td> <td class="tg-9wq8">pa</td> <td class="tg-9wq8">ta</td> <td class="tg-9wq8">te</td> <td class="tg-9wq8">bn</td> <td class="tg-9wq8">gu</td> <td class="tg-9wq8">hi</td> <td class="tg-9wq8">ml</td> <td class="tg-9wq8">mr</td> <td class="tg-9wq8">ta</td> <td class="tg-9wq8">te</td> <td class="tg-9wq8">hi</td> <td class="tg-9wq8">gu</td> <td class="tg-9wq8">ta</td> <td class="tg-9wq8">ta</td> <td class="tg-9wq8">as</td> <td class="tg-9wq8">as</td> <td class="tg-9wq8">bn</td> <td class="tg-9wq8">gu</td> <td class="tg-9wq8">hi</td> <td class="tg-9wq8">kn</td> <td class="tg-9wq8">ml</td> <td class="tg-9wq8">mr</td> <td class="tg-9wq8">or</td> <td class="tg-9wq8">pa</td> <td class="tg-9wq8">ta</td> <td class="tg-9wq8">te</td> </tr> <tr> <td class="tg-9wq8">IN-EN</td> <td class="tg-9wq8">29.6</td> <td class="tg-9wq8">40.3</td> <td class="tg-9wq8">43.9</td> <td class="tg-9wq8">36.4</td> <td class="tg-9wq8">34.6</td> <td class="tg-9wq8">33.5</td> <td class="tg-9wq8">34.4</td> <td class="tg-9wq8">43.2</td> <td class="tg-9wq8">33.2</td> <td class="tg-9wq8">36.2</td> <td class="tg-9wq8">20.0</td> <td class="tg-9wq8">24.1</td> <td class="tg-9wq8">23.6</td> <td class="tg-9wq8">20.4</td> <td class="tg-9wq8">20.4</td> <td class="tg-9wq8">18.3</td> <td class="tg-9wq8">18.5</td> <td class="tg-9wq8">29.7</td> <td class="tg-9wq8">25.1</td> <td class="tg-9wq8">24.1</td> <td class="tg-9wq8">30.2</td> <td class="tg-9wq8">29.9</td> <td class="tg-9wq8">23.3</td> <td class="tg-9wq8">32.2</td> <td class="tg-9wq8">34.3</td> <td class="tg-9wq8">37.9</td> <td class="tg-9wq8">28.8</td> <td class="tg-9wq8">31.7</td> <td class="tg-9wq8">30.8</td> <td class="tg-9wq8">30.1</td> <td class="tg-9wq8">35.8</td> <td class="tg-9wq8">28.6</td> <td class="tg-9wq8">33.5</td> </tr> <tr> <td class="tg-9wq8">EN-IN</td> <td class="tg-9wq8">15.3</td> <td class="tg-9wq8">25.6</td> <td class="tg-9wq8">38.6</td> <td class="tg-9wq8">19.1</td> <td class="tg-9wq8">14.7</td> <td class="tg-9wq8">20.1</td> <td class="tg-9wq8">18.9</td> <td class="tg-9wq8">33.1</td> <td class="tg-9wq8">13.5</td> <td class="tg-9wq8">14.1</td> <td class="tg-9wq8">11.4</td> <td class="tg-9wq8">15.3</td> <td class="tg-9wq8">20.0</td> <td class="tg-9wq8">7.2</td> <td class="tg-9wq8">12.7</td> <td class="tg-9wq8">6.2</td> <td class="tg-9wq8">7.6</td> <td class="tg-9wq8">25.5</td> <td class="tg-9wq8">17.2</td> <td class="tg-9wq8">9.9</td> <td class="tg-9wq8">10.9</td> <td class="tg-9wq8">11.6</td> <td class="tg-9wq8">6.9</td> <td class="tg-9wq8">20.3</td> <td class="tg-9wq8">22.6</td> <td class="tg-9wq8">34.5</td> <td class="tg-9wq8">18.9</td> <td class="tg-9wq8">16.3</td> <td class="tg-9wq8">16.1</td> <td class="tg-9wq8">13.9</td> <td class="tg-9wq8">26.9</td> <td class="tg-9wq8">16.3</td> <td class="tg-9wq8">22.0</td> </tr> </tbody> </table>

Updates

<details><summary>Click to expand </summary> 21 June 2022
Add more documentation on hosted API usage

18 December 2021

Tutorials updated with latest model links

26 November 2021

 - v0.3 models are now available for download

27 June 2021

- Updated links for indic to indic model
- Add more comments to training scripts
- Add link to [Samanantar Video](https://youtu.be/QwYPOd1eBtQ?t=383)
- Add folder structure in readme
- Add python wrapper for model inference

09 June 2021

- Updated links for models
- Added Indic to Indic model

09 May 2021

- Added fix for finetuning on datasets where some lang pairs are not present. Previously the script assumed the finetuning dataset will have data for all 11 indic lang pairs
- Added colab notebook for finetuning instructions
</details>

Table of contents

Resources

Try out model online (Huggingface spaces)

Download model

Indic to English: v0.3

English to Indic: v0.3

Indic to Indic: v0.3

Mirror links for the IndicTrans models

STS Benchmark

Download the human annotations for STS benchmark here

Using hosted APIs

Try out our models at IndicTrans Demos

<!-- <details><summary>Click to expand </summary> Please visit [API documentation](http://216.48.181.177:5050/docs#) to read more about the available API endpoints/methods you can use. #### Sample screenshot of translate_sentence POST request Go to [API documentation](http://216.48.181.177:5050/docs#), scroll to translate_sentence POST request endpoint and click "Try it out" button. <br> <p align="left"> <img src="./sample_images/translate_try_it_out.png" width=50% height=50% /> </p> <br> To try english to tamil translation, set the source language to "en
View on GitHub
GitHub Stars137
CategoryDevelopment
Updated19d ago
Forks37

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 10, 2026

No findings