Results for "data-crawling"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

483 skills found · Page 1 of 17

getmaxun / Maxun

15.3k

🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in minutes 🔥

universal

agentsapiautomation+15

Updated 5h ago

waditu / Tushare

14.7k

TuShare is a utility for crawling historical data of China stocks

universal

financefintechpandas+5

Updated 6h ago

oxylabs / AI Crawler Py

2.8k

Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural language prompt.

universal

aiai-agentsai-crawler+5

Updated 11h ago

oxylabs / Oxylabs AI Studio Py

2.6k

Structured data gathering from any website using AI-powered scraper, crawler, and browser automation. Scraping and crawling with natural language prompts. Equip your LLM agents with fresh data. AI Studio python SDK for intelligent web data gathering.

universal

ai-crawlerai-scraperai-scraping+9

Updated 11h ago

fugary / Calibre Douban

1.3k

Calibre new douban metadata source plugin. Douban no longer provides book APIs to the public, so it can only use web crawling to obtain data. This is a calibre Douban plugin based on web crawling.

universal

Updated 2h ago

facebookresearch / Cc Net

1.0k

Tools to download and cleanup Common Crawl data

universal

Updated 9h ago

arkadiyt / Bounty Targets

709

This project crawls bug bounty platform scopes (like Hackerone/Bugcrowd/Intigriti/etc) hourly and dumps them into the bounty-targets-data repo

universal

bountybugbugcrowd+6

Updated 7d ago

PhialsBasement / LibreCrawl

514

Free desktop SEO crawler - open source alternative to Screaming Frog and similar tools. Crawl websites, analyze links, extract SEO data, and export results without subscription fees. Fully customizable and extensible!

universal

desktop-appflaskfree+6

Updated 7h ago

blackfireio / Player

495

Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses.

universal

Updated 10h ago

shaohua0116 / ICLR2020 OpenReviewData

462

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

universal

conferencecrawlerdata-analysis+4

Updated 4mo ago

commoncrawl / Cc Pyspark

453

Process Common Crawl data with Python and Spark

universal

common-crawlcommoncrawlpyspark+5

Updated 4d ago

Florents-Tselai / WarcDB

405

WarcDB: Web crawl data as SQLite databases.

universal

clicrawlingdatabase+4

Updated 22d ago

shaohua0116 / ICLR2019 OpenReviewData

387

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

universal

crawlercrawling-pythonopenreview+1

Updated 10mo ago

cloudfour / Lighthouse Parade

372

A Node.js command line tool that crawls a domain and gathers lighthouse performance data for every page.

universal

Updated 3d ago

opensemanticsearch / Open Semantic Etl

277

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

universal

annotationdocumentselasticsearch+17

Updated 6d ago

techenthusiast167 / WebRecon

266

WebRecon is an advanced Open Source Intelligence (OSINT) web reconnaissance tool designed for cybersecurity professionals, penetration testers, and security researchers. It automates the process of gathering intelligence from target websites through comprehensive crawling, data extraction, and analysis.

universal

Updated 1d ago