Results for "porter-stemming-algorithm"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

23 skills found

reiver / Go Porterstemmer

192

A native Go clean room implementation of the Porter Stemming algorithm.

universal

Updated 6mo ago

romanbsd / Fast Stemmer

100

Fast Porter stemmer based on a C version of the algorithm

universal

Updated 12d ago

jedijulia / Porter Stemmer

python implementation of Porter's stemming algorithm

universal

Updated 22d ago

wooorm / Stmr.c

Porter Stemmer algorithm in C

universal

porterstemmerstemming

Updated 9d ago

kampsy / Gwizo

Simple Go implementation of the Porter Stemmer algorithm with powerful features.

universal

consonantsnlpnlp-stemming+3

Updated 1y ago

winkjs / Wink Porter2 Stemmer

Javascript Implementation of Porter Stemmer Algorithm V2 by Dr Martin F Porter

universal

natural-language-processingnlpporter-stemmer-algorithm+2

Updated 2mo ago

Tutanchamon / Pl Stemmer

A very simple python stemmer for Polish language based on Porter's Algorithm

universal

Updated 2y ago

johnjansen / Text

A collection of phonetic algorithms for Crystal. Including; Porter-Stemmer, Soundex, Metaphone, Double Metaphone & White Similarity

universal

Updated 2y ago

itfrombit / PorterStemmer

A simple ObjC wrapper around the Porter Stemmer algorithm

universal

Updated 6y ago

aztek / Porterstemmer

An implementation of the Porter stemming algorithm in Scala

universal

porter-stemming-algorithmscalastemmer

Updated 3y ago

samgiles / Porter Stemmer

Implementation of the Porter stemming algorithm in Rust

universal

Updated 1y ago

KrashV / Stemming Ru

Russian stemming procedure based on Porter's algorithm

universal

russian-specificstemming

Updated 9mo ago

maxpatiiuk / Porter Stemming

TypeScript implementation of the Porter Stemmer algorithm

universal

porterstemmerstemming

Updated 1mo ago

elixir-search / Stem Ex

Elixir implementation of the Porter Stemming Algorithm

universal

Updated 2y ago

msbmsb / Porter Stem.vim

Implementation of Porter stemming algorithm in vim script

universal

Updated 6y ago

SergeiGalkovskii / Porter S Algorithm For Stemming For Russian Language Csharp

Porter's algorithm for stemming for Russian language in C#

universal

Updated 3y ago

antonbaumann / German Go Stemmer

An efficient implementation of the German porter-stemming algorithm in Golang.

universal

language-processingnlpporter-stemmer+3

Updated 15d ago

caarmen / Porter Stemmer

Simple Porter stemmer algorithm

universal

lib

Updated 7mo ago

pymander / Ocaml Stemmer

OCaml implementation of the Porter Stemming Algorithm

universal

Updated 1y ago

RohithM191 / TSNE On Amazon Fine Food Reviews Dataset

Amazon-Food-Reviews-Analysis-and-Modelling Using Various Machine Learning Models Performed Exploratory Data Analysis, Data Cleaning, Data Visualization and Text Featurization(BOW, tfidf, Word2Vec). Build several ML models like KNN, Naive Bayes, Logistic Regression, SVM, Random Forest, GBDT, LSTM(RNNs) etc. Objective: Given a text review, determine the sentiment of the review whether its positive or negative. Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews About Dataset The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10 Attribute Information: Id ProductId - unique identifier for the product UserId - unqiue identifier for the user ProfileName HelpfulnessNumerator - number of users who found the review helpful HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not Score - rating between 1 and 5 Time - timestamp for the review Summary - brief summary of the review Text - text of the review 1 Amazon Food Reviews EDA, NLP, Text Preprocessing and Visualization using TSNE Defined Problem Statement Performed Exploratory Data Analysis(EDA) on Amazon Fine Food Reviews Dataset plotted Word Clouds, Distplots, Histograms, etc. Performed Data Cleaning & Data Preprocessing by removing unneccesary and duplicates rows and for text reviews removed html tags, punctuations, Stopwords and Stemmed the words using Porter Stemmer Documented the concepts clearly Plotted TSNE plots for Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec 2 KNN Applied K-Nearest Neighbour on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both brute & kd-tree implementation of KNN Evaluated the test data on various performance metrics like accuracy also plotted Confusion matrix using seaborne Conclusions: KNN is a very slow Algorithm takes very long time to train. Best Accuracy is achieved by Avg Word2Vec Featurization which is of 89.38%. Both kd-tree and brute algorithms of KNN gives comparatively similar results. Overall KNN was not that good for this dataset. 3 Naive Bayes Applied Naive Bayes using Bernoulli NB and Multinomial NB on Different Featurization of Data viz. BOW(uni-gram), tfidf. Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Printed Top 25 Important Features for both Negative and Positive Reviews Conclusions: Naive Bayes is much faster algorithm than KNN The performance of bernoulli naive bayes is way much more better than multinomial naive bayes. Best F1 score is acheived by BOW featurization which is 0.9342 4 Logistic Regression Applied Logistic Regression on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search & Randomized Search Cross Validation Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Showed How Sparsity increases as we increase lambda or decrease C when L1 Regularizer is used for each featurization Did pertubation test to check whether the features are multi-collinear or not Conclusions: Sparsity increases as we decrease C (increase lambda) when we use L1 Regularizer for regularization. TF_IDF Featurization performs best with F1_score of 0.967 and Accuracy of 91.39. Features are multi-collinear with different featurization. Logistic Regression is faster algorithm. 5 SVM Applied SVM with rbf(radial basis function) kernel on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search & Randomized Search Cross Validation Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Evaluated SGDClassifier on the best resulting featurization Conclusions: BOW Featurization with linear kernel with grid search gave the best results with F1-score of 0.9201. Using SGDClasiifier takes very less time to train. 6 Decision Trees Applied Decision Trees on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search with random 30 points for getting the best max_depth Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Plotted feature importance recieved from the decision tree classifier Conclusions: BOW Featurization(max_depth=8) gave the best results with accuracy of 85.8% and F1-score of 0.858. Decision Trees on BOW and tfidf would have taken forever if had taken all the dimensions as it had huge dimension and hence tried with max 8 as max_depth 6 Ensembles(RF&GBDT) Applied Random Forest on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search with random 30 points for getting the best max_depth, learning rate and n_estimators. Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Plotted world cloud of feature importance recieved from the RF and GBDT classifier Conclusions: TFIDF Featurization in Random Forest (BASE-LEARNERS=10) with random search gave the best results with F1-score of 0.857. TFIDF Featurization in GBDT (BASE-LEARNERS=275, DEPTH=10) gave the best results with F1-score of 0.8708.