Inltk
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
Install / Use
/learn @goru001/InltkREADME
Natural Language Toolkit for Indic Languages (iNLTK)
iNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Indic languages. Paper for iNLTK library has been accepted at EMNLP-2020's NLP-OSS workshop. Here's the link to the paper
Documentation
Checkout detailed docs along with Installation instructions at https://inltk.readthedocs.io
Supported languages
Native languages
| Language | Code <code-of-language> | |:--------:|:----:| | Hindi | hi | | Punjabi | pa | | Gujarati | gu | | Kannada | kn | | Malayalam | ml | | Oriya | or | | Marathi | mr | | Bengali | bn | | Tamil | ta | | Urdu | ur | | Nepali | ne | | Sanskrit | sa | | English | en | | Telugu | te |
Code Mixed languages
| Language | Script |Code <code-of-language> | |:--------:|:----:|:----:| | Hinglish (Hindi+English) | Latin | hi-en | | Tanglish (Tamil+English) | Latin | ta-en | | Manglish (Malayalam+English) | Latin | ml-en |
Repositories containing models used in iNLTK
| Language | Repository | Dataset used for Language modeling | Perplexity of ULMFiT LM<br>(on validation set) | Perplexity of TransformerXL LM<br>(on validation set) | Dataset used for Classification | Classification:<br> Test set Accuracy | Classification: <br>Test set MCC | Classification: Notebook<br>for Reproducibility | ULMFiT Embeddings visualization | TransformerXL Embeddings visualization | |:---------:|:----------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------:|:-----------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------:|:------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | Hindi | NLP for Hindi | Hindi Wikipedia Articles - 172k<br><br><br>Hindi Wikipedia Articles - 55k | 34.06<br><br><br>35.87 | 26.09<br><br><br>34.78 | BBC News Articles<br><br><br>IIT Patna Movie Reviews<br><br><br>IIT Patna Product Reviews | 78.75<br><br><br>57.74<br><br><br>75.71 | 0.71<br><br><br>0.37<br><br><br>0.59 | Notebook<br><br><br>Notebook<br><br><br>Notebook | Hindi Embeddings projection | Hindi Embeddings projection | | Bengali | NLP for Bengali | Bengali Wikipedia Articles | 41.2 | 39.3 | Bengali News Articles (Soham Articles) | 90.71 | 0.87 | Notebook | Bengali Embeddings projection | Bengali Embeddings projection | | Gujarati | NLP for Gujarati | Gujarati Wikipedia Articles | 34.12 | 28.12 | iNLTK Headlines Corpus - Gujarati | 91.05 | 0.86 | Notebook | Gujarati Embeddings projection | Gujarati Embeddings projection | | Malayalam | NLP for Malayalam |
