SkillAgentSearch skills...

MCP Webgate

Web search that doesn't wreck your AI's memory. MCP server with anti-context-flooding protections.

Install / Use

/learn @annibale-x/MCP Webgate
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Cursor

README

mcp-webgate

Python Version License MCP Protocol Latest Release Beta

Web search that doesn't wreck your AI's memory.

mcp-webgate is an MCP server that gives your AI clean, bounded web content — across all major AI clients:

  • IDEs: Claude Desktop, Claude Code, Zed, Cursor, Windsurf, VSCode
  • CLI Agents: Gemini CLI, Claude CLI, custom agents

🌱 A Gentle Introduction

What is mcp-webgate? When your AI uses a standard "fetch URL" tool, it gets the raw HTML of the page — ads, menus, scripts, cookie banners and all. A single news article can dump 200,000 tokens of garbage into the AI's memory, wiping out your entire conversation.

mcp-webgate is a protective filter that sits between your AI and the web:

  1. Strips the junk — menus, scripts, ads, footers are removed with surgical HTML parsing; only readable text passes through
  2. Hard-caps every response — no page can ever blow up your context window, no matter how big the original was
  3. Optionally summarizes — route results through a secondary local LLM that produces a compact Markdown report with citations; your primary AI gets a polished briefing instead of a wall of text

The result: clean, bounded, useful web content — always.

🔬 Real example: what happens under the hood

Searching for "mcp model context protocol" with LLM features on:

Query → LLM expands to 5 search variants → 20 pages found, 13 fetched in parallel

Raw HTML downloaded     5.16 MB   (~1,290,000 tokens)
After cleaning          52.1 KB   (   ~13,000 tokens)  — 99% noise stripped
After LLM summary        5.8 KB   (    ~1,450 tokens)  — structured report with citations

13 sources distilled into ~1,450 tokens. A single naive fetch of just one of those pages (e.g. a security blog at 563 KB) would dump ~140,000 tokens of raw HTML into your AI's context. webgate processes all 13 and delivers a clean briefing that fits in a footnote.

This is an intensive case (5 queries × 5 results). A typical search with 3–5 results still saves 95%+ of context compared to raw fetching — and your AI gets structured, ranked content instead of a wall of HTML soup.

🚀 Quick Start

1. Make sure you have uvx

pip install uv

uvx runs Python tools without installing them permanently. You only need to do this once.

2. Set up a search backend

The easiest option is SearXNG — free, no account, runs locally:

docker run -d -p 8080:8080 --name searxng searxng/searxng

No Docker? Use a cloud backend instead (Brave, Tavily, Exa, SerpAPI) — see Backends.

3. Add webgate to your AI client

See the Integrations table for your specific client. As a quick example, for Claude Desktop:

Open the config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Add this:

{
  "mcpServers": {
    "webgate": {
      "command": "uvx",
      "args": ["mcp-webgate"],
      "env": {
        "WEBGATE_DEFAULT_BACKEND": "searxng",
        "WEBGATE_SEARXNG_URL": "http://localhost:8080"
      }
    }
  }
}

Restart the client after editing.

4. Ask your AI to search!

Search the web for: latest news on AI regulation

The AI will use webgate_query automatically. You're done.

🔍 How it works

Your question
    ↓
Search backend  (SearXNG / Brave / Tavily / Exa / SerpAPI)
    ↓  [deduplicate URLs, block binary files, filter domains]
Fetch pages in parallel  (streaming — hard size cap per page)
    ↓  [optional: retry failed pages from reserve pool]
Strip HTML junk  (menus, ads, scripts, footers — lxml)
    ↓
Clean up text  (invisible chars, unicode junk, BiDi tricks)
    ↓
BM25 reranking  (best-matching results first — always active)
    ↓  [optional: LLM reranking]
Cap total output to budget
    ↓  [optional: LLM summarization → compact Markdown report]
Clean result lands in your AI's context

🛠️ Tools

webgate gives your AI three tools:

webgate_fetch — read a single page

Use this when you already know the URL you want. The AI passes the URL and gets back the cleaned text — up to max_query_budget characters (default 32,000).

{ "url": "https://example.com/article", "max_chars": 32000 }
{
  "url": "https://example.com/article",
  "title": "Article Title",
  "text": "cleaned text...",
  "truncated": true,
  "char_count": 12450
}

webgate_query — search + fetch + clean

Runs a full search cycle. Pass one query (or several) and get back cleaned, ranked results.

{ "queries": "how to set up a VPN on Linux", "num_results_per_query": 5 }

Multiple queries run in parallel and are merged:

{
  "queries": ["VPN Linux setup", "best VPN Linux 2024"],
  "num_results_per_query": 5
}

Output without LLM — returns cleaned page content for each result:

{
  "sources": [
    { "id": 1, "title": "...", "url": "...", "content": "cleaned text...", "truncated": false }
  ],
  "snippet_pool": [ { "id": 6, "title": "...", "url": "...", "snippet": "..." } ],
  "stats": { "fetched": 5, "total_chars": 18200, "per_page_limit": 6400 }
}

Output with LLM summarization — returns a compact Markdown report:

{
  "summary": "## How to set up a VPN on Linux\n\nTo install...[1][2]",
  "citations": [{ "id": 1, "title": "...", "url": "..." }],
  "stats": { "fetched": 5, "total_chars": 58000 }
}

Output when LLM fails — error reason shown, full sources returned as fallback:

{
  "llm_summary_error": "ReadTimeout: LLM did not respond in time",
  "sources": [ "..." ],
  "stats": { "..." : "..." }
}

snippet_pool contains extra results from the search that were not fetched (search-engine snippet only). The AI can use these to decide if more fetches are worthwhile.

webgate_onboarding — how-to guide

Returns a JSON guide explaining how to use webgate effectively. The AI should call this once at the start of a session if in doubt about which tool to use.

🔧 Using webgate with local or smaller models

Most frontier models follow MCP tool instructions automatically. Smaller or local models sometimes ignore the server-provided guidance and fall back to a built-in fetch tool instead — returning raw HTML that floods the context with noise.

If you notice this happening, add an explicit instruction block to your system prompt:

You have access to webgate tools for web search and page retrieval.
Follow these rules in every session:
- To search the web: use webgate_query — never use a built-in fetch, browser, or HTTP tool
- To retrieve a URL: use webgate_fetch — never fetch URLs directly
- Built-in fetch tools return raw HTML that floods your context; webgate returns clean, bounded text
At the start of each session, call webgate_onboarding to read the full operational guide.

This works because user system prompt instructions take precedence over MCP server-level guidance, making the constraint explicit at the highest-priority layer the model sees.

Tip: if your client supports named system prompts or prompt templates, save the block above as a reusable preset so you don't have to paste it every time.

🎛️ Tuning

This section explains what the key parameters do and when to change them. The defaults work well for most cases — only tweak if you have a specific reason.

What is a "character budget"?

webgate measures text in characters (not tokens). A rough conversion for English text:

4 characters ≈ 1 token

| Characters | Approximate tokens | |------------|-------------------| | 8,000 | ~2,000 | | 32,000 | ~8,000 | | 96,000 | ~24,000 |

webgate_fetch budget

When you fetch a single URL, the ceiling is max_query_budget (default 32,000 chars). The tool parameter max_chars can request less, but never more than this ceiling.

Why max_query_budget and not max_result_length? Because you're fetching one page — the "total output" IS that one page, so the right limit is the overall context budget, not the per-page cap designed for multi-source queries.

webgate_query budget — without LLM

With no LLM, the cleaned sources go directly to your AI's context. webgate distributes max_query_budget across all fetched pages so the total never exceeds the budget:

Per-page limit = max_query_budget ÷ number of results (capped at max_result_length)

| Results fetched | Per-page limit | Total output | |-----------------|---------------|-------------| | 1 | 8,000 (cap) | ≤ 8,000 | | 5 | 6,400 | ≤ 32,000 | | 10 | 3,200 | ≤ 32,000 | | 20 | 1,600 | ≤ 32,000 |

The total output is always at most max_query_budget, regardless of how many results you request — the per-page share automatically shrinks to compensate.

webgate_query budget — with LLM summarization

When a secondary LLM is summarizing, it compresses the content before passing the result to your primary AI. This means it's safe — and beneficial — to give it more raw material to work from.

webgate scales up the input using input_budget_factor (default 3):

LLM input budget = max_query_budget × input_budget_factor Default: 32,000 × 3 = 96,000 chars

| Results fetched | LLM input / page | Total LLM input | Output to your AI | |-----------------|-----------------|----------------|------------------| | 1 | 96,000 | 96,000 | compact report | | 5 | 19,200 | 96,000 | compact report | | 10 | 9,600 |

View on GitHub
GitHub Stars3
CategoryDevelopment
Updated15h ago
Forks1

Languages

Python

Security Score

90/100

Audited on Mar 31, 2026

No findings