Scrapelib
⛏ a library for scraping unreliable pages
Install / Use
/learn @jamesturk/ScrapelibREADME
scrapelib is a library for making requests to less-than-reliable websites.
This repository has moved to Codeberg, GitHub will remain as a read-only mirror.
Source: https://codeberg.org/jpt/scrapelib
Documentation: https://jamesturk.github.io/scrapelib/
Issues: https://codeberg.org/jpt/scrapelib/issues
Features
scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.
Advantages of using scrapelib over using requests as-is:
- HTTP(S) and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- highly-configurable request throtting
- configurable retries for non-permanent site failures
- All of the power of the suberb requests library.
Installation
scrapelib is on PyPI, and can be installed via any standard package management tool.
Example Usage
import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)
# Grab Google front page
s.get('http://google.com')
# Will be throttled to 10 HTTP requests per minute
while True:
s.get('http://example.com')
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
110.4kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
350.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
