WebFurl

A Rust browser agent that uses a recursively compressible semantic representation of web pages to minimize LLM context usage. The representation is dynamically cached across users, query-aware, and fully linked to the live DOM for real browser interactions.

How it works

WebFurl compresses a full web page (often 200k+ tokens of raw HTML) into a hierarchical semantic tree (typically 20-50 tokens at the top level). The agent can then unfold parts of the tree on demand, spending context budget only on what matters.

Compression pipeline:

Raw HTML is chunked at semantic boundaries (header, nav, main, sections, grids)
Leaf chunks are compressed in parallel via LLM calls
Parent nodes get structural summaries from child summaries (bottom-up)
Interactive elements (links, buttons, inputs) are extracted from the raw DOM with stable CSS selectors
Everything is cached by content hash in MongoDB, so unchanged subtrees are never recompressed

What makes it different:

Recursive compression: a page is a tree, not a flat summary. You can zoom into any branch.
Cross-user cache: the static parts of airbnb.com are compressed once and reused by everyone. Only dynamic content (prices, availability) gets recompressed.
Query-driven unfolding: when the user asks "find me a cheap listing", the tree auto-unfolds the most relevant nodes using embedding similarity, so the LLM sees a focused view without wasting budget on irrelevant sections.
DOM-linked actions: every interactive element has a pre-computed CSS selector that works against the live browser DOM. The agent can click links, fill forms, and navigate, with automatic handling of new tabs and page loads.
Vision support: images in the tree can be described on demand via a vision model, with descriptions cached.

Prerequisites

Rust (stable, 1.75+): rustup.rs
Docker: for MongoDB (Docker Desktop on Mac)
Chrome or Chromium: auto-detected on Mac and Linux
OpenRouter API key: for LLM calls (openrouter.ai)

Quick start

# Clone
git clone <repo-url>
cd Webfurl

# Configure
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY

# Run (builds, starts MongoDB, launches browser + agent)
./start.sh

That's it. The start script:

Builds the Rust workspace (cargo build --release)
Starts MongoDB in Docker (creates container if needed)
Launches the API server on :3001
Opens an interactive agent session with a visible Chrome browser

Usage

Once running, you'll see an interactive prompt:

/url https://www.airbnb.com/s/Mountain-View--CA/homes

The agent compresses the page into a semantic tree, then you can chat naturally:

> Find me the cheapest listing with good reviews

The agent will:

Pre-unfold relevant nodes (query-driven, using embeddings)
Read the compressed page context
Click on listings, navigate pages, fill search forms
Report back with findings

Commands

| Command | Description | |---------|-------------| | /url <url> | Navigate to a URL | | /unfold <node_id> | Manually expand a tree node | | /fold <node_id> | Collapse a node back | | /search <query> | Semantic search, unfolds the most relevant nodes | | /tree | Print the current tree structure | | /screenshot | Full page screenshot | | /screenshot <selector> | Element screenshot | | /browser | Open the current page in your default browser | | /quit | Exit |

Or just type naturally. The agent handles navigation, clicking, and form filling autonomously.

Configuration

All configuration is in .env:

| Variable | Description | Example | |----------|-------------|---------| | OPENROUTER_API_KEY | Required. Your OpenRouter API key | sk-or-v1-... | | WEBFURL_COMPRESSION_MODEL | LLM for page compression | openai/gpt-oss-120b | | WEBFURL_AGENT_MODEL | LLM for the agent | anthropic/claude-sonnet-4.6 | | WEBFURL_VISION_MODEL | Vision model for images | google/gemini-2.5-flash | | WEBFURL_TOKEN_BUDGET | Initial context budget per page (tokens) | 5000 | | WEBFURL_MAX_BUDGET | Hard ceiling the agent can expand to | 128000 | | CHROME_PATH | Chrome binary path (auto-detected) | /usr/bin/google-chrome | | WEBFURL_HEADLESS | Set to 1 for headless mode | 1 | | MONGODB_URI | MongoDB connection string | mongodb://localhost:27017 |

Architecture

Webfurl/
├── crates/
│   ├── webfurl-core/        # Compression pipeline, tree, cache, unfold, serialize
│   ├── webfurl-agent/       # Browser agent, Chrome CDP, interactive CLI
│   └── webfurl-server/      # Axum API server (REST endpoints)
├── start.sh                 # One-command launcher
├── stop.sh                  # Cleanup
└── .env.example

Core modules (webfurl-core):

pipeline.rs: HTML → SemanticTree (DOM chunking, parallel LLM compression, interactive element extraction)
tree.rs: SemanticNode / SemanticTree data structures
unfold.rs: Budget-based unfolding, semantic query unfold with ancestor chain resolution
serialize.rs: Tree → [WEBFURL] text block for LLM context
cache.rs: MongoDB content-hash cache (cross-user, chunk-level)
embeddings.rs: OpenRouter embedding client (Qwen3-Embedding-8B)

Agent (webfurl-agent):

agent.rs: Conversation loop, action execution, query-driven pre-unfolding
browser.rs: Chrome CDP session (navigation, click, fill, tab management, page load detection)

How the cache works

Every chunk of HTML is hashed by content. When any user visits a page:

Static chunks (nav, footer, layout) → cache hit, zero LLM calls
Dynamic chunks (prices, dates, user-specific content) → recompressed

This means the first user to visit airbnb.com pays the full compression cost. The second user compressing the same page layout pays only for the dynamic parts. The cache is stored in MongoDB and persists across sessions.

License

AGPL-3.0, see LICENSE

Webfurl

Install / Use

README