SkillAgentSearch skills...

Scraper

Distributed web scraper, kafka, spark, and html unit

Install / Use

/learn @big-datai/Scraper
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Distributed web scraper using HtmlUtils

The goal of this project is to scrape web, it works in a simple yet powerfull manner. You can install that project on multiple machines they will read messages from a kafka topic, enrich them with html content and push them back to another topic. Thi project is tested on 50, 000, 000 messages in a few hours that create a stream of 10 TB data an hour.

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated1y ago
Forks1

Languages

Java

Security Score

55/100

Audited on Jun 10, 2024

No findings