Results for "nlp-stemming"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

29 skills found

har07 / PySastrawi

350

Indonesian stemmer. Python port of PHP Sastrawi project.

universal

nlp-stemmingsastrawi-python

Updated 1mo ago

janlukasschroeder / Nlp Cheat Sheet Python

259

NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition

universal

cheat-sheetdependency-parsingintroduction+14

Updated 10d ago

winkjs / Wink Nlp Utils

134

NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.

universal

bag-of-wordsnatural-language-processingngrams+6

Updated 3d ago

CurrySoftware / Rust Stemmers

133

A rust implementation of some popular snowball stemming algorithms

universal

information-retrievalnlp-stemmingsnowball

Updated 21d ago

fortnightlabs / Snowball Js

102

javascript implementation of the popular snowball word stemming nlp algorithm

universal

Updated 5mo ago

abhishek305 / PyBot A ChatBot For Answering Python Queries Using NLP

Pybot can change the way learners try to learn python programming language in a more interactive way. This chatbot will try to solve or provide answer to almost every python related issues or queries that the user is asking for. We are implementing NLP for improving the efficiency of the chatbot. We will include voice feature for more interactivity to the user. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation. NLTK has been called “a wonderful tool for teaching and working in, computational linguistics using Python,” and “an amazing library to play with natural language.The main issue with text data is that it is all in text format (strings). However, the Machine learning algorithms need some sort of numerical feature vector in order to perform the task. So before we start with any NLP project we need to pre-process it to make it ideal for working. Converting the entire text into uppercase or lowercase, so that the algorithm does not treat the same words in different cases as different Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens i.e words that we actually want. Sentence tokenizer can be used to find the list of sentences and Word tokenizer can be used to find the list of words in strings.Removing Noise i.e everything that isn’t in a standard number or letter.Removing Stop words. Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely. These words are called stop words.Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form. Example if we were to stem the following words: “Stems”, “Stemming”, “Stemmed”, “and Stemtization”, the result would be a single word “stem”. A slight variant of stemming is lemmatization. The major difference between these is, that, stemming can often create non-existent words, whereas lemmas are actual words. So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma. Examples of Lemmatization are that “run” is a base form for words like “running” or “ran” or that the word “better” and “good” are in the same lemma so they are considered the same.

mattmurray / Topic Modelling Financial News

Topic modelling on financial news with Natural Language Processing

universal

dbscank-meanslatent-dirichlet-allocation+12

Updated 1mo ago

delvinso / Covid19 Unique Tweets

An on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000 tweets collected since mid-January 2020.

universal

Updated 2mo ago

Maximax67 / Words CEFR Dataset

A dataset mapping English words to CEFR levels based on the CEFR-J dataset, word lemmas, stems, parts of speech (POS), and frequency data from the N-Gram Google dataset. Ideal for NLP tasks, language proficiency assessment, and linguistic research.

universal

Updated 11d ago

turian / Pytextpreprocess

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)

universal

Updated 3y ago

Foysal87 / Sbnltk

Bangla NLP toolkit. Bangla NER, POStag, Stemmer, Word embedding, sentence embedding, summarization, preprocessor, sentiment analysis, etc.

universal

bangla-nerbangla-postagbangla-stemmer+3

Updated 3mo ago

kampsy / Gwizo

Simple Go implementation of the Porter Stemmer algorithm with powerful features.

universal

consonantsnlpnlp-stemming+3

Updated 1y ago

alifadwitiyap / NDETCStemmer

library yang mengimplementasikan metode stemming nondeterministic berbasis context untuk memecahkan permasalahan kata-kata ambigu (bermakna lebih dari satu) morfologis pada proses stemming kata dalam bahasa Indonesia.

universal

indonesian-languagenlp-librarynlp-stemming

Updated 5mo ago

FantacherJOY / Arabic Text Classification

Arabic text documents classified using SVM, k-nn and Naive bayes classifers.

universal

arabic-nlparabic-text-classificationdocument-classification+6

Updated 1y ago

YazidIflis / KabyleNLP

NLP tools for the Kabyle language: Lemmatization, Stemming, Tokenization, Text 2 Speech, SpellCheck

universal

Updated 1mo ago

zslwyuan / KMeans Emails Clustering Visualization NLP

KMeans-Emails-Clustering-Visualization-NLP: KMeans is used to cluster the emails. The words in the contents of emails are tokenlized and stemmed. This project transforms the corpus into vector space using tf-idf.By multidimensional scaling, the clustering result is visualized.

zed

clusterclusteringkmeans-clustering+4

Updated 1y ago

fizamusthafa / Mooc Recommender Nlp

The MOOC Recommender System utilizes NLP techniques for course recommendations in Massive Open Online Courses (MOOCs). It processes raw data, leveraging Tokenization, Porter Stemming, Cosine Similarity, etc., to extract tags from course descriptions, summaries, syllabuses, instructors, and subjects.

universal

Updated 2mo ago

SWI-Prolog / Packages Nlp

The SWI-Prolog NLP support library (stemming, etc.)

universal

Updated 1mo ago

mrc03 / Spooky Author Identification

The notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.

universal

Updated 3y ago

imuqtadir / Sentiment Analysis

This Project involves a process of analyzing sentiments about any particular movie using user reviews available on social networking sites like Facebook and Twitter into categories namely, Positive and Negative. The idea behind this was to help user make better judgement about the product by reading only positive reviews or negative reviews related to the product. Sentiment analysis involved extraction and measurement of the sentiment or “attitude” of a review using natural language processing steps such as stemming, stop-words removal and formation of similarity matrix using Stanford NLP libraries.

universal

Updated 3y ago