WebMCP 🧪

Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows.

First published August 13, 2025

Brandon Walderman <code><brwalder@microsoft.com></code> Leo Lee <code><leo.lee@microsoft.com></code> Andrew Nolan <code><annolan@microsoft.com></code> David Bokan <code><bokan@google.com></code> Khushal Sagar <code><khushalsagar@google.com></code> Hannah Van Opstal <code><hvanopstal@google.com></code>

TL;DR

We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as Model Context Protocol (MCP) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.

For the technical details of the proposal, code examples, API shape, etc. see proposal.md.

Terminology Used

Agent

An autonomous assistant that can understand a user's goals and take actions on the user's behalf to achieve them. Today, these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based chat interfaces.

Browser's Agent

An autonomous assistant as described above but provided by or through the browser. This could be an agent built directly into the browser or hosted by it, for example, via an extension or plug-in.

AI Platform

Providers of agentic assistants such as OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini.

Backend Integration

A form of API integration between an AI platform and a third-party service in which the AI platform can talk directly to the service's backend servers without a UI or running code in the client. For example, the AI platform communicating with an MCP server provided by the service.

Actuation

An agent interacting with a web page by simulating user input such as clicking, scrolling, typing, etc.

Background and Motivation

The web platform's ubiquity and popularity have made it the world's gateway to information and capabilities. Its ability to support complex, interactive applications beyond static content, has empowered developers to build rich user experiences and applications. These user experiences rely on visual layouts, mouse and touch interactions, and visual cues to communicate functionality and state.

As AI agents become more prevalent, the potential for even greater user value is within reach. AI platforms such as Copilot, ChatGPT, Claude, and Gemini are increasingly able to interact with external services to perform actions such as checking local weather, finding flight and hotel information, and providing driving directions. These functions are provided by external services that extend the AI model’s capabilities. These extensions, or “tools”, can be used by an AI to provide domain-specific functionality that the AI cannot achieve on its own. Existing tools integrate with each AI platform via bespoke “integrations” - each service registers itself with the chosen platform(s) and the platform communicates with the service via an API (MCP, OpenAPI, etc). In this document, we call this style of tool a “backend integration”; users make use of the tools/services by chatting with an AI, the AI platform communicates with the service on the user's behalf.

Much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate existing human-first interfaces when agent-first "tools" are not available. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable.

The web needs web developer involvement to thrive. What if web developers could easily provide their site's capabilities to the agentic web to engage with their users? We propose WebMCP, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with frontend code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone.

AI agents can integrate in the backend via protocols like MCP in order to fulfill a user's task. For a web developer to expose their site's functionality this way, they need to write a server, usually in Python or NodeJS, instead of frontend JS which may be more familiar.

There are several advantages to using the web to connect agents to services:

Businesses near-universally already offer their services via the web.

WebMCP allows them to leverage their existing business logic and UI, providing a quick, simple, and incremental way to integrate with agents. They don't have to re-architect their product to fit the API shape of a given agent. This is especially true when the logic is already heavily client-side.
Enables visually rich, cooperative interplay between a user, web page, and agent with shared context.

Users often start with a vague goal which is refined over time. Consider a user browsing for a high-value purchase. The user may prefer to start their journey on a specific page, ask their agent to perform some of the more tedious actions ("find me some options for a dress that's appropriate for a summer wedding, preferably red or orange, short or no sleeves and no embellishments"), and then take back over to browse among the agent-selected options.
Allows authors to serve humans and agents from one source

The human-use web is not going away. Integrating agents into it prevents fragmentation of their service and allows them to keep ownership of their interface, branding and connection with their users.

WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place via app-controlled UI, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly.

A diagram showing an agent communicating with a third-party service via WebMCP running in a live web page

In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually:

A diagram showing an agent communicating with a third-party service directl via MCP

Goals

Enable human-in-the-loop workflows: Support cooperative scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining visibility and control over the web page(s).
Simplify AI agent integration: Enable AI agents to be more reliable and helpful by interacting with web sites through well-defined JavaScript tools instead of through UI actuation.
Minimize developer burden: Any task that a user can accomplish through a page's UI can be made into a tool by re-using much of the page's existing JavaScript code.
Improve accessibility: Provide a standardized way for assistive technologies to access web application functionality beyond what's available through traditional accessibility trees which are not widely implemented.

Non-Goals

Headless browsing scenarios: While it may be possible to use this API for headless or server-to-server interactions where no human is present to observe progress, this is not a current goal. Headless scenarios create many questions like the launching of browsers and profile considerations.
Autonomous agent workflows: The API is not intended for fully autonomous agents operating without human oversight, or where a browser UI is not required. This task is likely better suited to existing protocols like A2A.
Replacement of backend integrations: WebMCP works with existing protocols like MCP and is not a replacement of existing protocols.
Replace human interfaces: The human web interface remains primary; agent tools augment rather than replace user interaction.
Enable / influence discoverability of sites to agents

Use Cases

The use cases for WebMCP are ones in which the user is collaborating with the agent, rather than completely delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated.

Example - Creative

Jen wants to create an invitation to her upcoming yard sale so she uses her browser to navigate to http://easely.example, her favorite graphic design platform. However, she's rather new to it and sometimes struggles to find all the functionality needed for her task in the app's extensive menus. She creates a "yard sale flyer" design and opens up a "templates" panel to look for a premade design she likes. There's so many templates and she's not sure which to choose from so she asks her browser agent for help.

Jen: Show me templates that ar

Webmcp

Install / Use

README