21 skills found
vngrs-ai / VnlpState-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
liulalemx / Felig ToolkitA toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus indexer and Term weighter.
fergiemcdowall / Term VectorA node.js module that creates a term vector from a mixed text input. Supports stopword removal and customisable separators.
ABHISHEKVALSAN / Malayalam Newspaper Article DatasetThe project scraps articles from a malayalam newspaper website to create a corpus. A set of queries is created and corresponding ground truth answers is retrieved. This can be used as a dataset that can check new tools in future like malaylam stemmer, stopwords removal, lemmatizers, etc...
juanantoniodelgado / StopWordsPHP StopWords removal library with support for multiple languages.
Abinaya-Krishnan / BM25 Model Python ImplementationBm25 Information retrieval model using python language
sergio11 / Spam Email Classifier LstmThis project uses a Bi-directional LSTM model 📧🤖 to classify emails as spam or legitimate, utilizing NLP techniques like tokenization, padding, and stopword removal. It aims to create an effective email classifier 💻📊 while addressing overfitting with strategies like early stopping 🚫.
bryanchw / Traditional Chinese Stopwords And Punctuations LibraryCreated a Python library specifically for Traditional Chinese stopwords and punctuations removal
rhnfzl / SqueakyCleanTextText preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection, stopword removal. Built for statistical ML and language models.
ahirtonlopes / Text MiningBasic Text Mining and NLP operations such as Tokenization, Portuguese POS Tagging, Stopword Removal among others.
afadel151 / Document Indexerthis is an open-source document indexing and retrieval system written from scratch in Java. It implements core Information Retrieval (IR) techniques including tokenization, stopword removal, stemming, TF-IDF weighting, and BM25 ranking
prigarg / Naive Bayes Algorithm From Scratch For Text ClassificationNaïve Bayes Algorithm is implemented from scratch in order to classify spam and not spam emails.
machinelearningprodigy / Sentiment AnalysisThe Twitter Sentiment Analysis app predicts whether a tweet has a Positive 😊 or Negative 😞 sentiment using Logistic Regression and Naive Bayes models. It preprocesses text with stemming and stopword removal for better accuracy and provides color-coded visual feedback for easy interpretation.
Salma0-8 / Sentiment Insights Analyzing ChatGPT User ReviewI employed NLP techniques to evaluate user feedback on ChatGPT, utilizing Python libraries like VADER for sentiment analysis to categorize reviews into positive, neutral, and negative sentiments. Implemented data preprocessing techniques such as tokenization and stopword removal, visualizing results with Plotly to yield actionable insights.
akshayaram95 / Near Real Time Road Traffic Event Detection Using Twitter And Spark.Gather tweets using twitter search API, pre-process tweets and extract important features to build a model using spark MLlib. Stream tweets using twitter streaming API and push data into kafka topic using a kafka producer after applying partial filters. Read from kafka topic using kafka consumer. Perform tokenization, stopword removal etc. to pre-process the data. Extract machine readable features using bag of words approach and predict instances with the model. Tweets are indexed to elasticsearch after classification. Constructed a traffic heat map by reading the coordinates data from elasticsearch.
astuanax / StopwordsStopwords removal:
atahanuz / Turkish Text PreprocessingA web application for Turkish text preprocessing including tokenization, stemming, normalization, and stopword removal.
sarwaralamsb / Text To Keyword ExtractionA Python web app that extracts keywords from text using TF-IDF and NLP, with adjustable keyword count and stopword removal.
Safae26 / Bag Of WordsA complete Bag of Words pipeline built with Python, NLTK, and spaCy. It demonstrates text preprocessing (tokenization, lowercasing, stopword removal, lemmatization) and converts text into numerical vectors using word frequency counts. Perfect for understanding fundamental NLP vectorization techniques.
Manavarya09 / Finance Trackerbuilt a machine learning model to classify news articles as real or fake using NLP techniques like tokenization, stopword removal, and TF-IDF. Trained Naïve Bayes and Logistic Regression models, achieving X% accuracy. Analyzed linguistic patterns to differentiate fake and real new