Scraper
distributed web scraper
Install / Use
/learn @d502e19/ScraperREADME
Scraper

A distributed web scraper
Usage
Worker module
The worker module takes, in prioritised order; CLI arguments, environment variables, and lastly default values. See the following help-message:
DatScraper Worker 0.1.0
d502e19@aau
USAGE:
worker [OPTIONS]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-f, --filter-enable <BOOLEAN> Specify whether filtering is enabled [env: SCRAPER_FILTER_ENABLE=] [default:
false]
-w, --filter-path <PATH> Specify path to list for filtering [env: SCRAPER_FILTER_PATH=] [default:
src/filter/whitelist.txt]
-t, --filter-type <STRING> Specify whether the list in the given filter-path is a 'white' or
'black'-list [env: SCRAPER_FILTER_TYPE=] [default: white]
-g, --influx-addr <STRING> Specify InfluxDB address [env: SCRAPER_METRICS_INFLUXDB_ADDR=] [default:
localhost]
-v, --influx-authenticate <BOOLEAN> Specify whether to use username/password authentication when connecting to
InfluxDB [env: SCRAPER_METRICS_INFLUXDB_AUTHENTICATE=] [default: true]
-k, --influx-database <STRING> Specify InfluxDB database [env: SCRAPER_METRICS_INFLUXDB_DATABASE=]
[default: scraper_db]
-j, --influx-password <STRING> Specify InfluxDB password [env: SCRAPER_METRICS_INFLUXDB_PASSWORD=]
[default: password]
-u, --influx-port <INT> Specify InfluxDB port [env: SCRAPER_METRICS_INFLUXDB_PORT=] [default: 8086]
-i, --influx-user <STRING> Specify InfluxDB username [env: SCRAPER_METRICS_INFLUXDB_USER=] [default:
worker]
-o, --log-level <LEVEL> Specify the log level {error, warn, info, debug, trace, off} [env:
LOG_LEVEL=] [default: info]
-l, --log-path <PATH> Specify the log-file path [env: SCRAPER_WORKER_LOG_PATH=] [default:
worker.log]
-d, --enable-metrics <BOOLEAN> Specify whether to enable metric logging [env: SCRAPER_METRICS_ENABLE=true]
[default: false]
-x, --name <STRING> Specify the prefix to the naming of the worker [env: SCRAPER_NAME=]
[default: worker]
-c, --rmq-collection <COLLECTION> Specify the RabbitMQ collection queue to connect to [env:
SCRAPER_RABBITMQ_COLLECTION_QUEUE=] [default: collection]
-e, --rmq-exchange <EXCHANGE> Specify the RabbitMQ exchange to connect to [env: SCRAPER_RABBITMQ_EXCHANGE=]
[default: work]
-p, --rmq-port <PORT> Specify the RabbitMQ port to connect to [env: SCRAPER_RABBITMQ_PORT=]
[default: 5672]
-n, --rmq-prefetch-count <COUNT> Specify the number of tasks to prefetch [env:
SCRAPER_RABBITMQ_PREFETCH_COUNT=] [default: 5]
-q, --rmq-queue <QUEUE> Specify the RabbitMQ queue to connect to [env: SCRAPER_RABBITMQ_QUEUE=]
[default: frontier]
-b, --redis-addr <ADDR> Specify the Redis address [env: SCRAPER_REDIS_ADDRESS=] [default: localhost]
-r, --redis-port <PORT> Specify the redis-port to connect to [env: SCRAPER_REDIS_PORT=] [default:
6379]
-s, --redis-set <SET> Specify the redis set to connect to [env: SCRAPER_REDIS_SET=] [default:
collection]
-a, --rmq-addr <ADDR> Specify the RabbitMQ address [env: SCRAPER_RMQ_ADDRESS=] [default:
localhost]
-m, --sentinel <NAME> An optional name of a master group for a sentinel Redis connection. [env:
SCRAPER_SENTINEL=] [default: none]
Redis Proxy module
The proxy module takes, in prioritised order; CLI arguments, environment variables, and lastly default values. See the following help-message:
DatScraper Proxy 0.1.0
d502e19@aau
USAGE:
redis-proxy [OPTIONS]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-o, --log-level <LEVEL> Specify the log level {error, warn, info, debug, trace, off} [env: LOG_LEVEL=]
[default: info]
-l, --log-path <PATH> Specify the log-file path [env: SCRAPER_PROXY_LOG_PATH=] [default: proxy.log]
-d, --rmq-redis-queue <QUEUE> Specify the RabbitMQ-REDIS queue to connect to [env: SCRAPER_RABBITMQ_REDIS_QUEUE=]
[default: collection]
-t, --rmq-consumer-tag <TAG> Specify the RabbitMQ consumer tag to use [env: SCRAPER_RABBITMQ_CONSUMER_TAG=]
[default: proxy]
-k, --rmq-routing-key <KEY> Specify the RabbitMQ routing-key to connect to [env: SCRAPER_RABBITMQ_ROUTING_KEY=]
[default: ]
-e, --addr <ADDR> Specify the redis address [env: SCRAPER_REDIS_ADDRESS=] [default: 192.168.99.100]
-r, --redis-port <PORT> Specify the redis-port to connect to [env: SCRAPER_REDIS_PORT=] [default: 6379]
-s, --redis-set <SET> Specify the redis set to connect to [env: SCRAPER_REDIS_SET=] [default:
collection]
-a, --rmq-addr <ADDR> Specify the RabbitMQ address [env: SCRAPER_RMQ_ADDRESS=] [default: 192.168.99.100]
-p, --rmq-port <PORT> Specify the RabbitMQ port to connect to [env: SCRAPER_RABBITMQ_PORT=] [default:
5672]
-m, --sentinel <NAME> An optional name of a master group for a sentinel Redis connection. [env:
SCRAPER_SENTINEL=] [default: none]
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
