SkillAgentSearch skills...

Inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

Install / Use

/learn @goru001/Inltk

README

Natural Language Toolkit for Indic Languages (iNLTK)

Gitter Downloads

iNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Indic languages. Paper for iNLTK library has been accepted at EMNLP-2020's NLP-OSS workshop. Here's the link to the paper

Documentation

Checkout detailed docs along with Installation instructions at https://inltk.readthedocs.io

Supported languages

Native languages

| Language | Code <code-of-language> | |:--------:|:----:| | Hindi | hi | | Punjabi | pa | | Gujarati | gu | | Kannada | kn | | Malayalam | ml | | Oriya | or | | Marathi | mr | | Bengali | bn | | Tamil | ta | | Urdu | ur | | Nepali | ne | | Sanskrit | sa | | English | en | | Telugu | te |

Code Mixed languages

| Language | Script |Code <code-of-language> | |:--------:|:----:|:----:| | Hinglish (Hindi+English) | Latin | hi-en | | Tanglish (Tamil+English) | Latin | ta-en | | Manglish (Malayalam+English) | Latin | ml-en |

Repositories containing models used in iNLTK

| Language | Repository | Dataset used for Language modeling | Perplexity of ULMFiT LM<br>(on validation set) | Perplexity of TransformerXL LM<br>(on validation set) | Dataset used for Classification | Classification:<br> Test set Accuracy | Classification: <br>Test set MCC | Classification: Notebook<br>for Reproducibility | ULMFiT Embeddings visualization | TransformerXL Embeddings visualization | |:---------:|:----------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------:|:-----------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------:|:------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | Hindi | NLP for Hindi | Hindi Wikipedia Articles - 172k<br><br><br>Hindi Wikipedia Articles - 55k | 34.06<br><br><br>35.87 | 26.09<br><br><br>34.78 | BBC News Articles<br><br><br>IIT Patna Movie Reviews<br><br><br>IIT Patna Product Reviews | 78.75<br><br><br>57.74<br><br><br>75.71 | 0.71<br><br><br>0.37<br><br><br>0.59 | Notebook<br><br><br>Notebook<br><br><br>Notebook | Hindi Embeddings projection | Hindi Embeddings projection | | Bengali | NLP for Bengali | Bengali Wikipedia Articles | 41.2 | 39.3 | Bengali News Articles (Soham Articles) | 90.71 | 0.87 | Notebook | Bengali Embeddings projection | Bengali Embeddings projection | | Gujarati | NLP for Gujarati | Gujarati Wikipedia Articles | 34.12 | 28.12 | iNLTK Headlines Corpus - Gujarati | 91.05 | 0.86 | Notebook | Gujarati Embeddings projection | Gujarati Embeddings projection | | Malayalam | NLP for Malayalam |

View on GitHub
GitHub Stars839
CategoryCustomer
Updated1mo ago
Forks161

Languages

Python

Security Score

100/100

Audited on Feb 7, 2026

No findings