PyCrawler
A python web crawler
Install / Use
/learn @theanti9/PyCrawlerREADME
Setup
- Open settings.py and adjust database settings
- DATABASE_ENGINE can either be "mysql" or "sqlite"
- For sqlite only DATABASE_HOST is used, and it should begin with a '/'
- All other DATABASE_* settings are required for mysql
- DEBUG mode causes the crawler to output some stats that are generated as it goes, and other debug messages
- LOGGING is a dictConfig dictionary to log output to the console and a rotating file, and works out-of-the-box, but can be modified
Current State
- mysql engine untested
- Issue in some situations where the database is locked and queries cannot execute. Presumably an issue only with sqlite's file-based approach
Logging
- DEBUG+ level messages are logged to the console, and INFO+ level messages are logged to a file.
- By default, the file for logging uses a TimedRotatingFileHandler that rolls over at midnight
- Setting DEBUG in the settings toggles wether or not DEBUG level messages are output at all
- Setting USE_COLORS in the settings toggles whether or not messages output to the console use colors depending on the level.
Misc
- Designed to be able to run on multiple machines and work together to collect info in central DB
- Queues links into the database to be crawled. This means that any machine running the crawler with the central db can grab from the same queue. Reduces crawling redundancy.
- Thread pool apprach to analyzing keywords in text.
Related Skills
node-connect
337.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
