Results for "corpora"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

715 skills found · Page 1 of 24

dariusk / Corpora

5.1k

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

universal

botscorpuslanguage+1

Updated 1d ago

nltk / Nltk Data

1.8k

NLTK Data

universal

corporalinguisticsnatural-language-processing+2

Updated 20h ago

juand-r / Entity Recognition Datasets

1.6k

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

universal

annotationscorporadatasets+7

Updated 1d ago

coqui-ai / Open Speech Corpora

1.4k

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

universal

speech-emotion-recognitionspeech-processingspeech-recognition+9

Updated 1d ago

shangjingbo1226 / AutoPhrase

1.2k

AutoPhrase: Automated Phrase Mining from Massive Text Corpora

universal

automaticcompound-wordslexicon+4

Updated 5h ago

strapi / Nextjs Corporate Starter

1.1k

Strapi Demo application for Corporate Websites using Next.js

universal

Updated 4h ago

piskvorky / Gensim Data

1.0k

Data repository for pretrained NLP models and NLP corpora.

universal

corporadatasetgensim+5

Updated 9d ago

taishi-i / Awesome Japanese Nlp Resources

936

A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese

universal

awesomeawesome-listcc0+6

Updated 6h ago

karthikncode / Nlp Datasets

920

A list of datasets/corpora for NLP tasks, in reverse chronological order.

universal

Updated 21d ago

cbaziotis / Ekphrasis

675

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

universal

nlpnlp-librarysemeval+8

Updated 24d ago