Sbnltk

Bangla NLP toolkit. Bangla NER, POStag, Stemmer, Word embedding, sentence embedding, summarization, preprocessor, sentiment analysis, etc.

Generate Convert Improve

Install / Use

/learn @Foysal87/Sbnltk

About this skill

Quality Score

0/100

README

pypi-download-stats

Please use colab for getting no problem. For transformer model, please install simpleTransformer first or use bn_nlp for static models. I uploaded dataset and training details in my github. There is a problem in sentiment analyzer. I Will fix it soon.

SBNLTK

SUST-Bangla Natural Language toolkit. A python module for Bangla NLP tasks.
Demo Version : 2.0.2
NEED python 3.6+ vesrion!! Use virtual Environment for not getting unessessary Issues!!

INSTALLATION

PYPI INSTALLATION

pip3 install sbnltk
pip3 install simpletransformers
pip3 install fasttext
pip3 install scikit-learn

MANUAL INSTALLATION FROM GITHUB

Clone this project
Install all the requirements
Call the setup.py from terminal

What will you get here?

Bangla Text Preprocessor
Bangla word dust,punctuation,stop word removal
Bangla word sorting according to Bangla or English alphabet
Bangla word normalization
Bangla word stemmer
Bangla Sentiment analysis(logisticRegression,LinearSVC,Multilnomial_naive_bayes,Random_Forst)
Bangla Sentiment analysis with Bert
Bangla sentence pos tagger (static, sklearn)
Bangla sentence pos tagger with BERT(Multilingual-cased,Multilingual uncased)
Bangla sentence NER(Static,sklearn)
Bangla sentence NER with BERT(Bert-Cased, Multilingual Cased/Uncased)
Bangla word word2vec(gensim,glove,fasttext)
Bangla sentence embedding(Contexual,Transformer/Bert)
Bangla Document Summarization(Feature based, Contexual, sementic Based)
Bangla Bi-lingual project(Bangla to english google translator without blocking IP)
Bangla document information Extraction

SEE THE CODE DOCS FOR USES!

TASKS, MODELS, ACCURACY, DATASET AND DOCS

| TASK | |:-------------------------:| | Preprocessor | Word tokenizers | | Sentence tokenizers | | Stemmer | | Sentiment Analysis | | | | | | | | | | POS tagger | | | | | | | | NER tagger | | | | | | | | | | Word Embedding | | | | | | Sentence Embedding | | | | | | Extractive Summarization | | | | | | Bi-lingual projects | | Information Extraction | MODEL | ACCURACY | DATASET | About | Code DOCS | :-----------------------------------------------------------------:|:--------------:|:-----------------------:|:-----:|:---------:| | Punctuation, Stop Word, DUST removal Word normalization, others.. | ------ | ----- | |docs | basic tokenizers Customized tokenizers | ---- | ---- | | docs | Basic tokenizers Customized tokenizers Sentence Cluster | ----- | ----- | | docs | StemmerOP | 85.5% | ---- | | docs | logisticRegression | 88.5% | 20,000+ | | docs | LinearSVC | 82.3% | 20,000+ | | docs | Multilnomial_naive_bayes | 84.1% | 20,000+ | | docs | Random Forest | 86.9% | 20,000+ | | docs | BERT | 93.2% | 20,000+ | | docs | Static method | 55.5% | 1,40,973 words | | docs | SK-LEARN classification | 81.2% | 6,000+ sentences | | docs | BERT-Multilingual-Cased | 69.2% | 6,000+ | | docs | BERT-Multilingual-Uncased | 78.7% | 6,000+ | | docs | Static method | 65.3% | 4,08,837 Entity | | docs | SK-LEARN classification | 81.2% | 65,000+ | | docs | BERT-Cased | 79.2% | 65,000+ | | docs | BERT-Mutilingual-Cased | 75.5% | 65,000+ | | docs | BERT-Multilingual-Uncased | 90.5% | 65,000+ | | docs | Gensim-word2vec-100D- 1,00,00,000+ tokens | - | 2,00,00,000+ sentences | | docs | Glove-word2vec-100D- 2,30,000+ tokens | - | 5,00,000 sentences | | docs | fastext-word2vec-200D 3,00,000+ | - | 5,00,000 sentences | | docs | Contextual sentence embedding | - | ----- | | docs | Transformer embedding_hd | - | 3,00,000+ human data | | docs | Transformer embedding_gd | - | 3,00,000+ google data | | docs | Feature-based based | 70.0% f1 score | ------ | | docs | Transformer sentence sentiment Based | 67.0% | ------ | | docs | Word2vec--sentences contextual Based | 60.0% | ----- | | docs | google translator with large data detector | ---- | ---- | | docs | Static word features | - | | | [docs](https://github.com/Fo

Related Skills

node-connect

336.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

336.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.0k

Commit, push, and open a PR