Webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

Generate Convert Improve

Install / Use

/learn @0xMassi/Webclaw

About this skill

Quality Score

0/100

README

<a href="https://webclaw.io"> <img src=".github/banner.png" alt="webclaw" width="700" /> </a> <h3 align="center"> The fastest web scraper for AI agents. 67% fewer tokens. Sub-millisecond extraction. Zero browser overhead. </h3> <a href="https://github.com/0xMassi/webclaw/stargazers"><img src="https://img.shields.io/github/stars/0xMassi/webclaw?style=for-the-badge&logo=github&logoColor=white&label=Stars&color=181717" alt="Stars" /></a> <a href="https://github.com/0xMassi/webclaw/releases"><img src="https://img.shields.io/github/v/release/0xMassi/webclaw?style=for-the-badge&logo=rust&logoColor=white&label=Version&color=B7410E" alt="Version" /></a> <a href="https://github.com/0xMassi/webclaw/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-10B981?style=for-the-badge" alt="License" /></a> <a href="https://www.npmjs.com/package/create-webclaw"><img src="https://img.shields.io/npm/dt/create-webclaw?style=for-the-badge&logo=npm&logoColor=white&label=Installs&color=CB3837" alt="npm installs" /></a> <a href="https://discord.gg/KDfd48EpnW"><img src="https://img.shields.io/badge/Discord-Join-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord" /></a> <a href="https://x.com/webclaw_io"><img src="https://img.shields.io/badge/Follow-@webclaw__io-000000?style=for-the-badge&logo=x&logoColor=white" alt="X / Twitter" /></a> <a href="https://webclaw.io"><img src="https://img.shields.io/badge/Website-webclaw.io-0A0A0A?style=for-the-badge&logo=safari&logoColor=white" alt="Website" /></a> <a href="https://webclaw.io/docs"><img src="https://img.shields.io/badge/Docs-Read-3B82F6?style=for-the-badge&logo=readthedocs&logoColor=white" alt="Docs" /></a>

<img src="assets/demo.gif" alt="Claude Code: web_fetch gets 403, webclaw extracts successfully" width="700" /> Claude Code's built-in web_fetch → 403 Forbidden. webclaw → clean markdown.

Your AI agent calls fetch() and gets a 403. Or 142KB of raw HTML that burns through your token budget. webclaw fixes both.

It extracts clean, structured content from any URL using Chrome-level TLS fingerprinting — no headless browser, no Selenium, no Puppeteer. Output is optimized for LLMs: 67% fewer tokens than raw HTML, with metadata, links, and images preserved.

                     Raw HTML                          webclaw
┌──────────────────────────────────┐    ┌──────────────────────────────────┐
│ <div class="ad-wrapper">         │    │ # Breaking: AI Breakthrough      │
│ <nav class="global-nav">         │    │                                  │
│ <script>window.__NEXT_DATA__     │    │ Researchers achieved 94%         │
│ ={...8KB of JSON...}</script>    │    │ accuracy on cross-domain         │
│ <div class="social-share">       │    │ reasoning benchmarks.            │
│ <button>Tweet</button>           │    │                                  │
│ <footer class="site-footer">     │    │ ## Key Findings                  │
│ <!-- 142,847 characters -->      │    │ - 3x faster inference            │
│                                  │    │ - Open-source weights            │
│         4,820 tokens             │    │         1,590 tokens             │
└──────────────────────────────────┘    └──────────────────────────────────┘

Get Started (30 seconds)

For AI agents (Claude, Cursor, Windsurf, VS Code)

npx create-webclaw

Auto-detects your AI tools, downloads the MCP server, and configures everything. One command.

Homebrew (macOS/Linux)

brew tap 0xMassi/webclaw
brew install webclaw

Prebuilt binaries

Download from GitHub Releases for macOS (arm64, x86_64) and Linux (x86_64, aarch64).

Cargo (from source)

cargo install --git https://github.com/0xMassi/webclaw.git webclaw-cli
cargo install --git https://github.com/0xMassi/webclaw.git webclaw-mcp

Docker

docker run --rm ghcr.io/0xmassi/webclaw https://example.com

Docker Compose (with Ollama for LLM features)

cp env.example .env
docker compose up -d

Why webclaw?

| | webclaw | Firecrawl | Trafilatura | Readability | |---|:---:|:---:|:---:|:---:| | Extraction accuracy | 95.1% | — | 80.6% | 83.5% | | Token efficiency | -67% | — | -55% | -51% | | Speed (100KB page) | 3.2ms | ~500ms | 18.4ms | 8.7ms | | TLS fingerprinting | Yes | No | No | No | | Self-hosted | Yes | No | Yes | Yes | | MCP (Claude/Cursor) | Yes | No | No | No | | No browser required | Yes | No | Yes | Yes | | Cost | Free | $$$$ | Free | Free |

Choose webclaw if you want fast local extraction, LLM-optimized output, and native AI agent integration.

What it looks like

$ webclaw https://stripe.com -f llm

> URL: https://stripe.com
> Title: Stripe | Financial Infrastructure for the Internet
> Language: en
> Word count: 847

# Stripe | Financial Infrastructure for the Internet

Stripe is a suite of APIs powering online payment processing
and commerce solutions for internet businesses of all sizes.

## Products
- Payments — Accept payments online and in person
- Billing — Manage subscriptions and invoicing
- Connect — Build a marketplace or platform
...

$ webclaw https://github.com --brand

{
  "name": "GitHub",
  "colors": [{"hex": "#59636E", "usage": "Primary"}, ...],
  "fonts": ["Mona Sans", "ui-monospace"],
  "logos": [{"url": "https://github.githubassets.com/...", "kind": "svg"}]
}

$ webclaw https://docs.rust-lang.org --crawl --depth 2 --max-pages 50

Crawling... 50/50 pages extracted
---
# Page 1: https://docs.rust-lang.org/
...
# Page 2: https://docs.rust-lang.org/book/
...

MCP Server — 10 tools for AI agents

webclaw ships as an MCP server that plugs into Claude Desktop, Claude Code, Cursor, Windsurf, OpenCode, Antigravity, Codex CLI, and any MCP-compatible client.

npx create-webclaw    # auto-detects and configures everything

Or manual setup — add to your Claude Desktop config:

{
  "mcpServers": {
    "webclaw": {
      "command": "~/.webclaw/webclaw-mcp"
    }
  }
}

Then in Claude: "Scrape the top 5 results for 'web scraping tools' and compare their pricing" — it just works.

Available tools

| Tool | Description | Requires API key? | |------|-------------|:-:| | scrape | Extract content from any URL | No | | crawl | Recursive site crawl | No | | map | Discover URLs from sitemaps | No | | batch | Parallel multi-URL extraction | No | | extract | LLM-powered structured extraction | No (needs Ollama) | | summarize | Page summarization | No (needs Ollama) | | diff | Content change detection | No | | brand | Brand identity extraction | No | | search | Web search + scrape results | Yes | | research | Deep multi-source research | Yes |

8 of 10 tools work locally — no account, no API key, fully private.

Features

Extraction

Readability scoring — multi-signal content detection (text density, semantic tags, link ratio)
Noise filtering — strips nav, footer, ads, modals, cookie banners (Tailwind-safe)
Data island extraction — catches React/Next.js JSON payloads, JSON-LD, hydration data
YouTube metadata — structured data from any YouTube video
PDF extraction — auto-detected via Content-Type
5 output formats — markdown, text, JSON, LLM-optimized, HTML

Content control

webclaw URL --include "article, .content"       # CSS selector include
webclaw URL --exclude "nav, footer, .sidebar"    # CSS selector exclude
webclaw URL --only-main-content                  # Auto-detect main content

Crawling

webclaw URL --crawl --depth 3 --max-pages 100   # BFS same-origin crawl
webclaw URL --crawl --sitemap                    # Seed from sitemap
webclaw URL --map                                # Discover URLs only

LLM features (Ollama / OpenAI / Anthropic)

webclaw URL --summarize                          # Page summary
webclaw URL --extract-prompt "Get all prices"    # Natural language extraction
webclaw URL --extract-json '{"type":"object"}'   # Schema-enforced extraction

Change tracking

webclaw URL -f json > snap.json                  # Take snapshot
webclaw URL --diff-with snap.json                # Compare later

Brand extraction

webclaw URL --brand                              # Colors, fonts, logos, OG image

Proxy rotation

webclaw URL --proxy http://user:pass@host:port   # Single proxy
webclaw URLs --proxy-file proxies.txt            # Pool rotation

Benchmarks

All numbers from real tests on 50 diverse pages. See benchmarks/ for methodology and reproduction instructions.

Extraction quality

Accuracy      webclaw     ███████████████████ 95.1%
              readability ████████████████▋   83.5%
              trafilatura ████████████████    80.6%
              newspaper3k █████████████▎      66.4%

Noise removal webclaw     ███████████████████ 96.1%
              readability █████████████████▊  89.4%
              trafilatura ██████████████████▏ 91.2%
              newspaper3k ███████████████▎    76.8%

Speed (pure extraction, no network)

10KB page     webclaw     ██                   0.8ms
              readability █████                2.1ms
              trafilatura ██████████           4.3ms

100KB page    webclaw     ██                   3.2ms
              readability █████                8.7ms
              trafilatura ██████████           18.4ms

Token efficiency (feeding to Claude/GPT)

| Format | Tokens | vs R

Related Skills

himalaya

341.8k

CLI to manage emails via IMAP/SMTP. Use `himalaya` to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

Hook Development

84.6k

This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.

0xMassi

View profile

View on GitHub

GitHub Stars241

CategoryDevelopment

Updated19m ago

Forks17

0xMassi/webclaw

Languages

Rust

Security Score

100/100

Audited on Mar 30, 2026

No findings