12 skills found
MinishLab / SemhashFast Multimodal Semantic Deduplication & Filtering
adbar / CourlanClean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
AlekseyKorshuk / Chat Data PipelineChat data cleaning, filtering and deduplication pipeline.
msihly / MediorThe tag-based local media management system. Portable offline database. Hierarchical tagging. Deduplication. Ordered collections. Comprehensive optimized filtering. Custom media player with transcoding. AI diffusion and facial recognition. Animated thumbnails.
unum-bio / FasterFASTAFaster FASTA and FASTQ file processing command line tool - parsing, sorting, deduplication, transliteration, filter, and stats
AI4Bharat / SetuSetu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Built on Apache Spark, Setu encompasses four key stages: document preparation, document cleaning and analysis, flagging and filtering, and deduplication.
magicdude4eva / Calendar SyncSync ICS feeds like holidays, waste pickup, and F1 calendars into your CalDAV calendar (e.g., mailbox.org). Supports emoji mapping, recurring events, location filters, deduplication, and Docker automation.
shaltielshmid / MinHashSharpA Robust Library in C# for Similarity Estimation
imad457 / Wodlists Filterfilter large wordlists by minimum length and optional deduplication
iceblinker / AiostreamsAIOStreams: The ultimate Stremio addon hub. Consolidate streams from Torrentio, Debrid services & Usenet into one clean list. Features advanced filtering (HDR/DV, languages), smart deduplication, catalog management & 10+ built-in scrapers. Customize your experience with a powerful proxy & rule engine. Docker-ready & self-hostable.
Hewlett-Packard-ESS / Logstash Filter DedupeDeduplication filter for Redis
0xpuck / Amazon ScraperA Scrapy spider to scrape Amazon UK product details based on search terms, categories, and filters. Features deduplication filter for unique listings and optional sponsored link exclusion.