29 skills found
har07 / PySastrawiIndonesian stemmer. Python port of PHP Sastrawi project.
janlukasschroeder / Nlp Cheat Sheet PythonNLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
winkjs / Wink Nlp UtilsNLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
CurrySoftware / Rust StemmersA rust implementation of some popular snowball stemming algorithms
fortnightlabs / Snowball Jsjavascript implementation of the popular snowball word stemming nlp algorithm
abhishek305 / PyBot A ChatBot For Answering Python Queries Using NLPPybot can change the way learners try to learn python programming language in a more interactive way. This chatbot will try to solve or provide answer to almost every python related issues or queries that the user is asking for. We are implementing NLP for improving the efficiency of the chatbot. We will include voice feature for more interactivity to the user. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation. NLTK has been called “a wonderful tool for teaching and working in, computational linguistics using Python,” and “an amazing library to play with natural language.The main issue with text data is that it is all in text format (strings). However, the Machine learning algorithms need some sort of numerical feature vector in order to perform the task. So before we start with any NLP project we need to pre-process it to make it ideal for working. Converting the entire text into uppercase or lowercase, so that the algorithm does not treat the same words in different cases as different Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens i.e words that we actually want. Sentence tokenizer can be used to find the list of sentences and Word tokenizer can be used to find the list of words in strings.Removing Noise i.e everything that isn’t in a standard number or letter.Removing Stop words. Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely. These words are called stop words.Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form. Example if we were to stem the following words: “Stems”, “Stemming”, “Stemmed”, “and Stemtization”, the result would be a single word “stem”. A slight variant of stemming is lemmatization. The major difference between these is, that, stemming can often create non-existent words, whereas lemmas are actual words. So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma. Examples of Lemmatization are that “run” is a base form for words like “running” or “ran” or that the word “better” and “good” are in the same lemma so they are considered the same.
mattmurray / Topic Modelling Financial NewsTopic modelling on financial news with Natural Language Processing
delvinso / Covid19 Unique TweetsAn on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000 tweets collected since mid-January 2020.
Maximax67 / Words CEFR DatasetA dataset mapping English words to CEFR levels based on the CEFR-J dataset, word lemmas, stems, parts of speech (POS), and frequency data from the N-Gram Google dataset. Ideal for NLP tasks, language proficiency assessment, and linguistic research.
turian / PytextpreprocessPreprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)
Foysal87 / SbnltkBangla NLP toolkit. Bangla NER, POStag, Stemmer, Word embedding, sentence embedding, summarization, preprocessor, sentiment analysis, etc.
kampsy / GwizoSimple Go implementation of the Porter Stemmer algorithm with powerful features.
alifadwitiyap / NDETCStemmerlibrary yang mengimplementasikan metode stemming nondeterministic berbasis context untuk memecahkan permasalahan kata-kata ambigu (bermakna lebih dari satu) morfologis pada proses stemming kata dalam bahasa Indonesia.
FantacherJOY / Arabic Text ClassificationArabic text documents classified using SVM, k-nn and Naive bayes classifers.
YazidIflis / KabyleNLPNLP tools for the Kabyle language: Lemmatization, Stemming, Tokenization, Text 2 Speech, SpellCheck
zslwyuan / KMeans Emails Clustering Visualization NLPKMeans-Emails-Clustering-Visualization-NLP: KMeans is used to cluster the emails. The words in the contents of emails are tokenlized and stemmed. This project transforms the corpus into vector space using tf-idf.By multidimensional scaling, the clustering result is visualized.
fizamusthafa / Mooc Recommender NlpThe MOOC Recommender System utilizes NLP techniques for course recommendations in Massive Open Online Courses (MOOCs). It processes raw data, leveraging Tokenization, Porter Stemming, Cosine Similarity, etc., to extract tags from course descriptions, summaries, syllabuses, instructors, and subjects.
SWI-Prolog / Packages NlpThe SWI-Prolog NLP support library (stemming, etc.)
mrc03 / Spooky Author IdentificationThe notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.
imuqtadir / Sentiment AnalysisThis Project involves a process of analyzing sentiments about any particular movie using user reviews available on social networking sites like Facebook and Twitter into categories namely, Positive and Negative. The idea behind this was to help user make better judgement about the product by reading only positive reviews or negative reviews related to the product. Sentiment analysis involved extraction and measurement of the sentiment or “attitude” of a review using natural language processing steps such as stemming, stop-words removal and formation of similarity matrix using Stanford NLP libraries.