SkillAgentSearch skills...

Sbnltk

Bangla NLP toolkit. Bangla NER, POStag, Stemmer, Word embedding, sentence embedding, summarization, preprocessor, sentiment analysis, etc.

Install / Use

/learn @Foysal87/Sbnltk
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

pypi-download-stats

PyPI version shields.io PyPI license PyPI pyversions PyPI download month PyPI download week

Please use colab for getting no problem. For transformer model, please install simpleTransformer first or use bn_nlp for static models. I uploaded dataset and training details in my github. There is a problem in sentiment analyzer. I Will fix it soon.

SBNLTK

SUST-Bangla Natural Language toolkit. A python module for Bangla NLP tasks.
Demo Version : 2.0.2
NEED python 3.6+ vesrion!! Use virtual Environment for not getting unessessary Issues!!

INSTALLATION

PYPI INSTALLATION

pip3 install sbnltk
pip3 install simpletransformers
pip3 install fasttext
pip3 install scikit-learn

MANUAL INSTALLATION FROM GITHUB

  • Clone this project
  • Install all the requirements
  • Call the setup.py from terminal

What will you get here?

  • Bangla Text Preprocessor
  • Bangla word dust,punctuation,stop word removal
  • Bangla word sorting according to Bangla or English alphabet
  • Bangla word normalization
  • Bangla word stemmer
  • Bangla Sentiment analysis(logisticRegression,LinearSVC,Multilnomial_naive_bayes,Random_Forst)
  • Bangla Sentiment analysis with Bert
  • Bangla sentence pos tagger (static, sklearn)
  • Bangla sentence pos tagger with BERT(Multilingual-cased,Multilingual uncased)
  • Bangla sentence NER(Static,sklearn)
  • Bangla sentence NER with BERT(Bert-Cased, Multilingual Cased/Uncased)
  • Bangla word word2vec(gensim,glove,fasttext)
  • Bangla sentence embedding(Contexual,Transformer/Bert)
  • Bangla Document Summarization(Feature based, Contexual, sementic Based)
  • Bangla Bi-lingual project(Bangla to english google translator without blocking IP)
  • Bangla document information Extraction

SEE THE CODE DOCS FOR USES!

TASKS, MODELS, ACCURACY, DATASET AND DOCS

| TASK | MODEL | ACCURACY | DATASET | About | Code DOCS | |:-------------------------:|:-----------------------------------------------------------------:|:--------------:|:-----------------------:|:-----:|:---------:| | Preprocessor | Punctuation, Stop Word, DUST removal Word normalization, others.. | ------ | ----- | |docs | | Word tokenizers | basic tokenizers Customized tokenizers | ---- | ---- | | docs | | Sentence tokenizers | Basic tokenizers Customized tokenizers Sentence Cluster | ----- | ----- | | docs | | Stemmer | StemmerOP | 85.5% | ---- | | docs | | Sentiment Analysis | logisticRegression | 88.5% | 20,000+ | | docs | | | LinearSVC | 82.3% | 20,000+ | | docs | | | Multilnomial_naive_bayes | 84.1% | 20,000+ | | docs | | | Random Forest | 86.9% | 20,000+ | | docs | | | BERT | 93.2% | 20,000+ | | docs | | POS tagger | Static method | 55.5% | 1,40,973 words | | docs | | | SK-LEARN classification | 81.2% | 6,000+ sentences | | docs | | | BERT-Multilingual-Cased | 69.2% | 6,000+ | | docs | | | BERT-Multilingual-Uncased | 78.7% | 6,000+ | | docs | | NER tagger | Static method | 65.3% | 4,08,837 Entity | | docs | | | SK-LEARN classification | 81.2% | 65,000+ | | docs | | | BERT-Cased | 79.2% | 65,000+ | | docs | | | BERT-Mutilingual-Cased | 75.5% | 65,000+ | | docs | | | BERT-Multilingual-Uncased | 90.5% | 65,000+ | | docs | | Word Embedding | Gensim-word2vec-100D- 1,00,00,000+ tokens | - | 2,00,00,000+ sentences | | docs | | | Glove-word2vec-100D- 2,30,000+ tokens | - | 5,00,000 sentences | | docs | | | fastext-word2vec-200D 3,00,000+ | - | 5,00,000 sentences | | docs | | Sentence Embedding | Contextual sentence embedding | - | ----- | | docs | | | Transformer embedding_hd | - | 3,00,000+ human data | | docs | | | Transformer embedding_gd | - | 3,00,000+ google data | | docs | | Extractive Summarization | Feature-based based | 70.0% f1 score | ------ | | docs | | | Transformer sentence sentiment Based | 67.0% | ------ | | docs | | | Word2vec--sentences contextual Based | 60.0% | ----- | | docs | | Bi-lingual projects | google translator with large data detector | ---- | ---- | | docs | | Information Extraction | Static word features | - | | | [docs](https://github.com/Fo

Related Skills

View on GitHub
GitHub Stars27
CategoryDevelopment
Updated3mo ago
Forks12

Languages

Python

Security Score

92/100

Audited on Dec 7, 2025

No findings