228 skills found · Page 1 of 8
adbar / TrafilaturaPython & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
jbesomi / TextheroText preprocessing, representation and visualization from zero to hero.
kavgan / Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
jfilter / Clean Text🧹 Python package for text cleaning
EtienneAb3d / WhisperHalluExperimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
devmount / GermanWordEmbeddingsToolkit to obtain and preprocess German text corpora, train models and evaluate them with generated testsets. Built with Gensim and Tensorflow.
Deffro / Text Preprocessing Techniques16 Text Preprocessing Techniques in Python for Twitter Sentiment Analysis.
hamelsmu / KtextUtilities for preprocessing text for deep learning with Keras
lyeoni / PrenlpPreprocessing Library for Natural Language Processing
Unstructured-IO / Pipeline Sec FilingsPreprocessing pipeline notebooks and API supporting text extraction from SEC documents
HojiChar / HojiCharThe robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.
NoorBayan / TanqeehTanqeeh is a Python library designed to preprocess and clean Arabic text efficiently. It provides a comprehensive set of functions to normalize, remove unwanted characters, fix spacing issues, and enhance text quality for NLP applications.
hscspring / Multi Label Text Classification For Chinesepytorch implementation of multi-label text classification, includes kinds of models and pretrained. Especially for Chinese preprocessing.
singletongue / Wikipedia UtilsUtility scripts for preprocessing Wikipedia texts for NLP
Jcharis / NeattextNeatText a simple NLP package for cleaning textual data and text preprocessing
pemagrg1 / NLP Flask WebsiteA simple Flask website for all NLP tasks which includes Text Preprocessing, Keyword Extraction, Text Summarization etc. Created Date: 30 Jan 2019
LanguageMachines / UctoUnicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
matthewjdenny / PreTextAn R package to assess the effects of text preprocessing decisions.
fge / GrappaWrite parsers for arbitrary text inputs, entirely in Java, with no preprocessing phase
ezgisubasi / Turkish Tweets Sentiment AnalysisThis sentiment analysis project determines whether the tweets posted in the Turkish language on Twitter are positive or negative.