Scraper
Distributed web scraper, kafka, spark, and html unit
Install / Use
/learn @big-datai/ScraperREADME
Distributed web scraper using HtmlUtils
The goal of this project is to scrape web, it works in a simple yet powerfull manner. You can install that project on multiple machines they will read messages from a kafka topic, enrich them with html content and push them back to another topic. Thi project is tested on 50, 000, 000 messages in a few hours that create a stream of 10 TB data an hour.
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
