715 skills found · Page 1 of 24
dariusk / CorporaA collection of small corpuses of interesting data for the creation of bots and similar stuff.
nltk / Nltk DataNLTK Data
juand-r / Entity Recognition DatasetsA collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
coqui-ai / Open Speech Corpora💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
shangjingbo1226 / AutoPhraseAutoPhrase: Automated Phrase Mining from Massive Text Corpora
strapi / Nextjs Corporate StarterStrapi Demo application for Corporate Websites using Next.js
piskvorky / Gensim DataData repository for pretrained NLP models and NLP corpora.
taishi-i / Awesome Japanese Nlp ResourcesA curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese
karthikncode / Nlp DatasetsA list of datasets/corpora for NLP tasks, in reverse chronological order.
cbaziotis / EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
AI4Bharat / Indicnlp CatalogA collaborative catalog of NLP resources for Indic languages
nonamestreet / Weixin Public Corpus微信公众号语料库
ratsgo / Embedding한국어 임베딩 (Sentence Embeddings Using Korean Corpora)
dccuchile / Spanish Word EmbeddingsSpanish word embeddings computed with different methods and from different corpora
strapi / Strapi Starter Next CorporateNext.js starter for creating a corporate site with Strapi.
sameerkumar18 / Corporate Bs Generator ApiCorporate Bullshit(BuzzWord) Generator API
Kail-Fu / Social WorldsSocial Worlds: Visualizing Social Connections in Captioned Image Corpora
natasha / CorusLinks to Russian corpora + Python functions for loading and parsing
ncbi-nlp / BLUE BenchmarkBLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora.
averkij / A StudioLingtrain Alignment Studio is an ML based app for texts alignment on different languages. It can produce parallel corpora and parallel books.