18 skills found
adbar / TrafilaturaPython & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
google / CorpuscrawlerCrawler for linguistic corpora
clovaai / WebvicobOfficial Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
banglakit / Corpus Buildertoolkit for compiling corpus from various sources
skillachie / News Corpus BuilderAutomatic News Corpus Builder
praaline / PraalinePraaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
carlfm01 / Librivox ToolsCollector and speech cutter for librivox audiobooks
psankar / KorkaiA corpus builder for Tamil by analyzing wordpress, blogger, wikipedia dumps
DLR-SC / Corpus Annotation Graph BuilderCorpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph.
uma-pi1 / OPIEC PipelineNo description available
Unbabel / Word Level Qe Corpus BuilderBuilds a WMT18-like corpus for word-level QE with annotations in the source and target words.
dohliam / Ebook CorpusEbook Corpus - A parser and extractor for electronic books
Aditya-ds-1806 / Dictpress TtsTTS plugin for dictpress
AndyTheFactory / Article Extraction DatasetArticle title, authors, date and body extraction dataset.
thecsw / Katya DevKatya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
aleris / ReadME RoTex Corpus BuilderBuilds a corpus of Romanian text, suitable for NLP research, from different online sources.
ltgoslo / WcbWikipedia Corpus Builder
nmkelly / HTRC ExtractorHTRC Archive Extractor and Corpus Builder