Sbnltk
Bangla NLP toolkit. Bangla NER, POStag, Stemmer, Word embedding, sentence embedding, summarization, preprocessor, sentiment analysis, etc.
Install / Use
/learn @Foysal87/SbnltkREADME
pypi-download-stats
Please use colab for getting no problem. For transformer model, please install simpleTransformer first or use bn_nlp for static models. I uploaded dataset and training details in my github. There is a problem in sentiment analyzer. I Will fix it soon.
SBNLTK
SUST-Bangla Natural Language toolkit. A python module for Bangla NLP tasks.
Demo Version : 2.0.2
NEED python 3.6+ vesrion!! Use virtual Environment for not getting unessessary Issues!!
INSTALLATION
PYPI INSTALLATION
pip3 install sbnltk
pip3 install simpletransformers
pip3 install fasttext
pip3 install scikit-learn
MANUAL INSTALLATION FROM GITHUB
- Clone this project
- Install all the requirements
- Call the setup.py from terminal
What will you get here?
- Bangla Text Preprocessor
- Bangla word dust,punctuation,stop word removal
- Bangla word sorting according to Bangla or English alphabet
- Bangla word normalization
- Bangla word stemmer
- Bangla Sentiment analysis(logisticRegression,LinearSVC,Multilnomial_naive_bayes,Random_Forst)
- Bangla Sentiment analysis with Bert
- Bangla sentence pos tagger (static, sklearn)
- Bangla sentence pos tagger with BERT(Multilingual-cased,Multilingual uncased)
- Bangla sentence NER(Static,sklearn)
- Bangla sentence NER with BERT(Bert-Cased, Multilingual Cased/Uncased)
- Bangla word word2vec(gensim,glove,fasttext)
- Bangla sentence embedding(Contexual,Transformer/Bert)
- Bangla Document Summarization(Feature based, Contexual, sementic Based)
- Bangla Bi-lingual project(Bangla to english google translator without blocking IP)
- Bangla document information Extraction
SEE THE CODE DOCS FOR USES!
TASKS, MODELS, ACCURACY, DATASET AND DOCS
| TASK | MODEL | ACCURACY | DATASET | About | Code DOCS | |:-------------------------:|:-----------------------------------------------------------------:|:--------------:|:-----------------------:|:-----:|:---------:| | Preprocessor | Punctuation, Stop Word, DUST removal Word normalization, others.. | ------ | ----- | |docs | | Word tokenizers | basic tokenizers Customized tokenizers | ---- | ---- | | docs | | Sentence tokenizers | Basic tokenizers Customized tokenizers Sentence Cluster | ----- | ----- | | docs | | Stemmer | StemmerOP | 85.5% | ---- | | docs | | Sentiment Analysis | logisticRegression | 88.5% | 20,000+ | | docs | | | LinearSVC | 82.3% | 20,000+ | | docs | | | Multilnomial_naive_bayes | 84.1% | 20,000+ | | docs | | | Random Forest | 86.9% | 20,000+ | | docs | | | BERT | 93.2% | 20,000+ | | docs | | POS tagger | Static method | 55.5% | 1,40,973 words | | docs | | | SK-LEARN classification | 81.2% | 6,000+ sentences | | docs | | | BERT-Multilingual-Cased | 69.2% | 6,000+ | | docs | | | BERT-Multilingual-Uncased | 78.7% | 6,000+ | | docs | | NER tagger | Static method | 65.3% | 4,08,837 Entity | | docs | | | SK-LEARN classification | 81.2% | 65,000+ | | docs | | | BERT-Cased | 79.2% | 65,000+ | | docs | | | BERT-Mutilingual-Cased | 75.5% | 65,000+ | | docs | | | BERT-Multilingual-Uncased | 90.5% | 65,000+ | | docs | | Word Embedding | Gensim-word2vec-100D- 1,00,00,000+ tokens | - | 2,00,00,000+ sentences | | docs | | | Glove-word2vec-100D- 2,30,000+ tokens | - | 5,00,000 sentences | | docs | | | fastext-word2vec-200D 3,00,000+ | - | 5,00,000 sentences | | docs | | Sentence Embedding | Contextual sentence embedding | - | ----- | | docs | | | Transformer embedding_hd | - | 3,00,000+ human data | | docs | | | Transformer embedding_gd | - | 3,00,000+ google data | | docs | | Extractive Summarization | Feature-based based | 70.0% f1 score | ------ | | docs | | | Transformer sentence sentiment Based | 67.0% | ------ | | docs | | | Word2vec--sentences contextual Based | 60.0% | ----- | | docs | | Bi-lingual projects | google translator with large data detector | ---- | ---- | | docs | | Information Extraction | Static word features | - | | | [docs](https://github.com/Fo
Related Skills
node-connect
336.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
336.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.0kCommit, push, and open a PR
