Selenium MCP Server

Model Context Protocol (MCP) server for Selenium WebDriver that enables AI agents and LLMs to control real browsers for automation

This project exposes Selenium WebDriver as an MCP (Model Context Protocol) server, allowing AI agents to control a real browser through structured tools.

It enables LLMs and autonomous agents to perform tasks like:

Opening browsers
Navigating websites
Discovering UI elements
Clicking buttons and links
Typing into inputs
Extracting page text
Taking screenshots
Many more future upgrades (in-progress)

This makes it possible to build AI-powered browser automation systems and autonomous QA agents.

Why This Project Exists
Architecture
Features
Installation
Running the Server
MCP Server Version
Available MCP Tools
Browser Session Flow
Example Agent Workflow
System Prompt for AI Agents
Prompt Customization
Logging
Configure Your MCP Client
Requirements
Use Cases
Contributing
License
Author

WHY THIS PROJECT EXISTS

Modern AI agents need a way to interact with real applications.

While traditional automation tools like Selenium exist, they are not directly usable by LLM agents.

This project bridges that gap by exposing Selenium functionality through MCP tools so that agents can:

Understand web pages
Discover UI elements
Perform actions
Validate results

ARCHITECTURE

flowchart TD
    A[LLM Agent] --> B[MCP Protocol]
    B --> C[Selenium MCP Server]

    C --> D[Browser Tools]
    C --> E[Navigation Tools]
    C --> F[Interaction Tools]
    C --> G[Element Tools]
    C --> H[Debug Tools]

    D --> I[Selenium WebDriver]
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[Browser]

FEATURES

MCP-compatible Selenium automation server
Browser session management
Navigation controls
UI element discovery
Accessibility-aware interaction
Screenshot capture
Page text extraction
Headless browser support
Multi-tab browser management (open, switch, close, track active tab)
Improved interactive element detection for modern UI frameworks (React, Angular, dynamic DOM)

INSTALLATION

Run the following command

pip install selenium-mcp

RUNNING THE SERVER

Start the MCP server

You can start the Selenium MCP server using different transport modes depending on your use case.

Default (STDIO)

selenium-mcp run

Uses stdio transport
Best for local agent integrations
No network exposure

HTTP Mode (Recommended)

selenium-mcp run --transport http --host 127.0.0.1 --port 3345

Starts server at: http://127.0.0.1:3345

MCP endpoint: http://127.0.0.1:3345/mcp

Best for:

API integrations
Postman / curl testing
production-style usage

SSE Mode (Streaming)

selenium-mcp run --transport sse --host 127.0.0.1 --port 3345

Starts server at: http://127.0.0.1:3345/sse

Best for:

streaming-based agents
real-time interactions

Note: Note: SSE endpoints are streaming and may not show output directly in the browser.

Expose Server on Network:

selenium-mcp run --transport http --host 0.0.0.0 --port 3345

Makes server accessible from:

other devices on the same network
Docker / VM environments

Notes:

Default port: 3336 Supported transports:

stdio (default)
http
sse

Ensure port is within range: 1–65535

MCP SERVER VERSION

To check the current version of the selenium MCP server, run the following command:

selenium-mcp version

AVAILABLE MCP TOOLS

Run the following command to get the list of tools supported by MCP server:

selenium-mcp tools

This returns the list of tools supported by MCP server.

BROWSER CONTROL

open_browser – Launch a new browser session
close_browser – Close the browser session
maximize_browser – Maximize browser window
fullscreen_browser – Switch browser to fullscreen

NAVIGATION

open_url – Navigate to a specific URL
navigate_back – Navigate back in browser history
navigate_forward – Navigate forward in history
refresh_page – Reload the page
wait_for_page – Wait for page to load
get_page_title – Get the current page title

TAB MANAGEMENT

get_tabs – Retrieve all open tabs in the current session
switch_tab – Switch to a specific tab using index
open_new_tab – Open a new tab and optionally navigate to a URL
close_tab – Close a specific tab by index
get_current_tab – Retrieve the currently active tab
name_tab – Assign a custom name to a tab for easier identification

These tools allow agents to manage multiple tabs within a single browser session.

ELEMENT DISCOVERY

get_interactive_elements – Discover visible interactive elements on the page
get_accessibility_tree – Retrieve simplified accessibility tree for the page

These tools allow agents to understand the UI structure before interacting with it.

Notes

Element detection is optimized for modern web applications (React, Angular, dynamic UI frameworks).
Elements are identified using interaction signals such as roles, click handlers, and focusability.
Only visible and meaningful elements are returned to reduce noise.

INTERACTION TOOLS

click_element – Click an element by index
type_into_element – Enter text into an input field

Elements must first be discovered using: get_interactive_elements

PAGE ANALYSIS

get_page_text – Extract visible text from the page

Useful for:

validation
reasoning
information extraction

VISUAL DEBUGGING

take_screenshot – Capture a screenshot of the current browser window

Screenshot Storage Location

When screenshots are captured, they are automatically saved in a hidden folder inside your home directory.

macOS / Linux

Screenshots are stored at:

~/.selenium-mcp/screenshot

Example full path:

/Users/<your-username>/.selenium-mcp/screenshot

You can open the folder using Terminal:

open ~/.selenium-mcp/screenshot

Windows

Screenshots are stored at:

C:\Users\<your-username>\.selenium-mcp\screenshot

Example:

C:\Users\John\.selenium-mcp\screenshot

You can open it from File Explorer by entering the following in the address bar:

%USERPROFILE%\.selenium-mcp\screenshot

Custom Screenshot Directory (Optional)

You can override the default screenshot location using the environment variable: SELENIUM_MCP_SCREENSHOT_DIR

macOS / Linux

export SELENIUM_MCP_SCREENSHOT_DIR=~/my-screenshots

Windows (PowerShell)

$env:SELENIUM_MCP_SCREENSHOT_DIR="C:\my-screenshots"

All screenshots will then be saved to the specified directory.

Notes

The folder is created automatically the first time a screenshot is taken.
The .selenium-mcp directory is hidden by default because it starts with a dot (.).
You can safely delete screenshots anytime.

BROWSER SESSION FLOW

Each browser session is identified by a session_id.

Typical workflow for agents:

open_browser
open_url
wait_for_page
get_interactive_elements
(optional) get_tabs / switch_tab if multiple tabs are present
click_element or type_into_element

MULTI-TAB WORKFLOW

Agents can work with multiple tabs within the same browser session.

Example workflow:

open_browser
open_url
open_new_tab("https://example.com")
get_tabs
switch_tab(index)
perform actions
close_tab(index)

Notes

Each tab is tracked using an internal index.
The active tab is automatically managed and updated.
All actions are performed on the currently active tab.

EXAMPLE AGENT WORKFLOW

Example task:

Open Chrome browser.
Navigate to Google.com
Type the text "Selenium MCP" in the search box.
Press the search button

Agent steps:

open_browser
open_url("https://google.com")
wait_for_page
get_interactive_elements
type_into_element(index, "Selenium MCP")
click_element(index)
wait_for_page
get_page_text

SYSTEM PROMPT FOR AI AGENTS

This repository includes a production-grade system prompt designed specifically for browser automation agents that interact with this Selenium MCP server.

The prompt contains detailed operational guidelines that instruct the AI agent on how to:

initialize and control the browser
discover and interact with UI elements
analyze page structure using the accessibility tree
avoid hallucinating element indexes
handle navigation and page reloads
recover from stale elements
follow a deterministic execution loop (PLAN → ACT → OBSERVE → UPDATE PLAN)
enforce safety limits on tool usage

Prompt location

prompts/system_prompt.md

How to use

Whenever you build an AI agent that interacts with this MCP server, this prompt should be provided as the system prompt for the model.

Why this prompt

Browser automation agents can easily make incorrect decisions if not guided properly. This system prompt provides strict operational rules and guardrails that help the agent:

use MCP tools correctly
avoid incorrect element interactions
minimize hallucinations
perform reliable b

Selenium MCP Server

Install / Use

README

Selenium MCP Server

Table of Contents

WHY THIS PROJECT EXISTS

ARCHITECTURE

FEATURES

INSTALLATION

Run the following command

RUNNING THE SERVER

Start the MCP server

Default (STDIO)

HTTP Mode (Recommended)

SSE Mode (Streaming)

Expose Server on Network:

Notes:

MCP SERVER VERSION

AVAILABLE MCP TOOLS

BROWSER CONTROL

NAVIGATION

TAB MANAGEMENT

ELEMENT DISCOVERY

Notes

INTERACTION TOOLS

PAGE ANALYSIS

VISUAL DEBUGGING

Screenshot Storage Location

macOS / Linux

Windows

Custom Screenshot Directory (Optional)

macOS / Linux

Windows (PowerShell)

Notes

BROWSER SESSION FLOW

Typical workflow for agents:

MULTI-TAB WORKFLOW

Example workflow:

Notes

EXAMPLE AGENT WORKFLOW

Example task:

Agent steps:

SYSTEM PROMPT FOR AI AGENTS

Prompt location

How to use

Why this prompt