Crw
⚡Lightweight Firecrawl alternative in Rust — 91.5% coverage, 5x faster, 3MB RAM. Web scraper & crawler with MCP server for Claude, LLM extraction, JS rendering.
Install / Use
/learn @us/CrwQuality Score
Category
Development & EngineeringSupported Platforms
README
Don't want to self-host? fastcrw.com is the managed cloud — global proxy network, auto-scaling, dashboard, and API keys. Same Firecrawl-compatible API. Get 500 free credits →
CRW is the open-source web scraper built for AI agents. Built-in MCP server (stdio + HTTP), single binary, ~6 MB idle RAM. Give Claude Code, Cursor, or any MCP client web scraping superpowers in 30 seconds. Firecrawl-compatible API — 5.5x faster, 75x less memory, 92% coverage on 1K real-world URLs.
Built-in MCP server. Single binary. No Redis. No Node.js.
cargo install crw-mcp
claude mcp add crw -- crw-mcp
What's New
0.0.14 (2026-03-25)
Features
- mcp: auto-download LightPanda binary for zero-config JS rendering (41f443b)
- mcp: auto-spawn headless Chrome for JS rendering in embedded mode (9a6b0ae)
Bug Fixes
- ci: move crw-mcp to Tier 4 in release workflow and add workflow_dispatch (d7584a8)
0.0.13 (2026-03-24)
Features
- mcp: add embedded mode — self-contained MCP server, no crw-server needed (75e5450)
Bug Fixes
- ci: switch release-please to simple type for Rust workspace support (51cd420)
v0.0.12
- Readability drill-down — when
<main>or<article>wraps >90% of body, the extractor now searches inside for narrower content elements (.main-page-content,.article-content,.entry-content, etc.) instead of discarding. Fixes MDN pages returning 35 chars and StackOverflow returning only the question - Base64 image stripping —
data:URI images are removed in both HTML cleaning (lol_html) and markdown post-processing (regex safety net). Eliminates massive base64 blobs from Reddit and similar sites - Select/dropdown removal —
<select>elements removed inonlyMainContentmode; dropdown/city-selector/location-selector noise patterns added. Fixes Hürriyet city dropdown leaking into content - Extended scored selectors — added
.main-page-content,.js-post-body,.s-prose,#question,.page-content,#page-content,[role="article"]for better MDN, StackOverflow, and generic site coverage - Smarter fallback chain — when primary extraction produces too-short markdown, both fallbacks (cleaned HTML and basic clean) are tried and the longer result is picked, instead of short-circuiting on non-empty but insufficient content
Why CRW?
CRW gives you Firecrawl's API with a fraction of the resource usage. No runtime dependencies, no Redis, no Node.js — just a single binary you can deploy anywhere.
| Metric | CRW (self-hosted) | fastcrw.com (cloud) | Firecrawl | Crawl4AI | Spider | |---|---|---|---|---|---| | Coverage (1K URLs) | 92.0% | 92.0% | 77.2% | — | 99.9% | | Avg Latency | 833ms | 833ms | 4,600ms | — | — | | P50 Latency | 446ms | 446ms | — | — | 45ms (static) | | Noise Rejection | 88.4% | 88.4% | noise 6.8% | noise 11.3% | noise 4.2% | | Idle RAM | 6.6 MB | 0 (managed) | ~500 MB+ | — | cloud-only | | Cold start | 85 ms | 0 (always-on) | 30–60 s | — | — | | HTTP scrape | ~30 ms | ~30 ms | ~200 ms+ | ~480 ms | ~45 ms | | Proxy network | BYO | Global (built-in) | Built-in | — | Cloud-only | | Cost / 1K scrapes | $0 (self-hosted) | From $13/mo | $0.83–5.33 | $0 | $0.65 | | Dependencies | single binary | None (API) | Node + Redis + PG + RabbitMQ | Python + Playwright | Rust / cloud | | License | AGPL-3.0 | Managed | AGPL-3.0 | Apache-2.0 | MIT |
<details> <summary><b>Full benchmark details</b></summary>CRW vs Firecrawl — Tested on Firecrawl scrape-content-dataset-v1 (1,000 real-world URLs, JS rendering enabled):
- CRW covers 92% of URLs vs Firecrawl's 77.2% — 15 percentage points higher
- CRW is 5.5x faster on average (833ms vs 4,600ms)
- CRW uses ~75x less RAM at idle (6.6 MB vs ~500 MB+)
- Firecrawl requires 5 containers (Node.js, Redis, PostgreSQL, RabbitMQ, Playwright) — CRW is a single binary
Crawl4AI vs Firecrawl vs Spider — Independent benchmark by Spider.cloud:
| Metric | Spider | Firecrawl | Crawl4AI | |--------|--------|-----------|----------| | Static throughput | 182 p/s | 27 p/s | 19 p/s | | Success (static) | 100% | 99.5% | 99% | | Success (SPA) | 100% | 96.6% | 93.7% | | Success (anti-bot) | 99.6% | 88.4% | 72% | | Latency (static) | 45ms | 310ms | 480ms | | Latency (SPA) | 820ms | 1,400ms | 1,650ms |
Firecrawl independent review — Scrapeway benchmark: 64.3% success rate, $5.11/1K scrapes, 0% on LinkedIn/Twitter.
Resource comparison:
| Metric | CRW | Firecrawl | |---|---|---| | Min RAM | ~7 MB | 4 GB | | Recommended RAM | ~64 MB (under load) | 8–16 GB | | Docker images | single ~8 MB binary | ~2–3 GB total | | Cold start | 85 ms | 30–60 seconds | | Containers needed | 1 (+optional sidecar) | 5 |
</details>Features
- 🔧 MCP server — built-in stdio + HTTP transport for Claude Code, Cursor, Windsurf, and any MCP client
- 🔌 Firecrawl-compatible API — same endpoint family and familiar request/response ergonomics
- 📄 6 output formats — markdown, HTML, cleaned HTML, raw HTML, plain text, links, structured JSON
- 🤖 LLM structured extraction — send a JSON schema, get validated structured data back (Anthropic tool_use + OpenAI function calling)
- 🌐 JS rendering — auto-detect SPAs with shell heuristics, render via LightPanda, Playwright, or Chrome (CDP)
- 🕷️ BFS crawler — async crawl with rate limiting, robots.txt, sitemap support, concurrent jobs
- 🔒 Security — SSRF protection (private IPs, cloud metadata, IPv6), constant-time auth, dangerous URI filtering
- 🐳 Docker ready — multi-stage build with LightPanda sidecar
- 🎯 CSS selector & XPath — extract specific DOM elements before Markdown conversion
- ✂️ Chunking & filtering — split content into topic/sentence/regex chunks; rank by BM25 or cosine similarity
- 🕵️ Stealth mode — browser-like UA rotation and header injection to reduce bot detection
- 🌐 Per-request proxy — override the global proxy per scrape request
Cloud vs Self-Hosted
| Feature | Self-hosted | Cloud (fastcrw.com) |
|---|---|---|
| Setup | cargo install crw-server | Sign up → get API key |
| Infrastructure | You manage | Fully managed |
| Proxy | Bring your own | Global proxy network |
| Scaling | Manual | Auto-scaling |
| API | Firecrawl-compatible | Same Firecrawl-compatible API |
Both use the same Firecrawl-compatible API — your code works with either. Switch between self-hosted and cloud by changing the base URL.
Quick Start
MCP (AI agents — recommended):
cargo install crw-mcp
claude mcp add crw -- crw-mcp
That's it. Claude Code now has
crw_scrape,crw_crawl,crw_maptools. For Cursor, Windsurf, Cline, and other MCP clients, see MCP Server.
CLI (no server needed):
cargo install crw-cli
crw https://example.com
Self-hosted server:
cargo install crw-server
crw-server
Enable JS rendering (optional):
crw-server setup
This downloads LightPanda and creates a config.local.toml for JS rendering. See JS Rendering for details.
Cloud (no setup):
curl -X POST https://fastcrw.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Get your API key at [fastcrw.com](https://fastcrw.com
