Selenium MCP Server
Selenium WebDriver MCP server for AI agents and LLM-powered browser automation
Install / Use
/learn @nayakprashant/Selenium MCP ServerQuality Score
Category
Development & EngineeringSupported Platforms
README
Selenium MCP Server
Model Context Protocol (MCP) server for Selenium WebDriver that enables AI agents and LLMs to control real browsers for automation
This project exposes Selenium WebDriver as an MCP (Model Context Protocol) server, allowing AI agents to control a real browser through structured tools.
It enables LLMs and autonomous agents to perform tasks like:
- Opening browsers
- Navigating websites
- Discovering UI elements
- Clicking buttons and links
- Typing into inputs
- Extracting page text
- Taking screenshots
- Many more future upgrades (in-progress)
This makes it possible to build AI-powered browser automation systems and autonomous QA agents.
Table of Contents
- Why This Project Exists
- Architecture
- Features
- Installation
- Running the Server
- MCP Server Version
- Available MCP Tools
- Browser Session Flow
- Example Agent Workflow
- System Prompt for AI Agents
- Prompt Customization
- Logging
- Configure Your MCP Client
- Requirements
- Use Cases
- Contributing
- License
- Author
WHY THIS PROJECT EXISTS
Modern AI agents need a way to interact with real applications.
While traditional automation tools like Selenium exist, they are not directly usable by LLM agents.
This project bridges that gap by exposing Selenium functionality through MCP tools so that agents can:
- Understand web pages
- Discover UI elements
- Perform actions
- Validate results
ARCHITECTURE
flowchart TD
A[LLM Agent] --> B[MCP Protocol]
B --> C[Selenium MCP Server]
C --> D[Browser Tools]
C --> E[Navigation Tools]
C --> F[Interaction Tools]
C --> G[Element Tools]
C --> H[Debug Tools]
D --> I[Selenium WebDriver]
E --> I
F --> I
G --> I
H --> I
I --> J[Browser]
FEATURES
- MCP-compatible Selenium automation server
- Browser session management
- Navigation controls
- UI element discovery
- Accessibility-aware interaction
- Screenshot capture
- Page text extraction
- Headless browser support
- Multi-tab browser management (open, switch, close, track active tab)
- Improved interactive element detection for modern UI frameworks (React, Angular, dynamic DOM)
INSTALLATION
Run the following command
pip install selenium-mcp
RUNNING THE SERVER
Start the MCP server
You can start the Selenium MCP server using different transport modes depending on your use case.
Default (STDIO)
selenium-mcp run
- Uses stdio transport
- Best for local agent integrations
- No network exposure
HTTP Mode (Recommended)
selenium-mcp run --transport http --host 127.0.0.1 --port 3345
Starts server at: http://127.0.0.1:3345
MCP endpoint: http://127.0.0.1:3345/mcp
Best for:
- API integrations
- Postman / curl testing
- production-style usage
SSE Mode (Streaming)
selenium-mcp run --transport sse --host 127.0.0.1 --port 3345
Starts server at: http://127.0.0.1:3345/sse
Best for:
- streaming-based agents
- real-time interactions
Note: Note: SSE endpoints are streaming and may not show output directly in the browser.
Expose Server on Network:
selenium-mcp run --transport http --host 0.0.0.0 --port 3345
Makes server accessible from:
- other devices on the same network
- Docker / VM environments
Notes:
Default port: 3336
Supported transports:
stdio (default)
http
sse
Ensure port is within range: 1–65535
MCP SERVER VERSION
To check the current version of the selenium MCP server, run the following command:
selenium-mcp version
AVAILABLE MCP TOOLS
Run the following command to get the list of tools supported by MCP server:
selenium-mcp tools
This returns the list of tools supported by MCP server.
BROWSER CONTROL
open_browser– Launch a new browser sessionclose_browser– Close the browser sessionmaximize_browser– Maximize browser windowfullscreen_browser– Switch browser to fullscreen
NAVIGATION
open_url– Navigate to a specific URLnavigate_back– Navigate back in browser historynavigate_forward– Navigate forward in historyrefresh_page– Reload the pagewait_for_page– Wait for page to loadget_page_title– Get the current page title
TAB MANAGEMENT
get_tabs– Retrieve all open tabs in the current sessionswitch_tab– Switch to a specific tab using indexopen_new_tab– Open a new tab and optionally navigate to a URLclose_tab– Close a specific tab by indexget_current_tab– Retrieve the currently active tabname_tab– Assign a custom name to a tab for easier identification
These tools allow agents to manage multiple tabs within a single browser session.
ELEMENT DISCOVERY
get_interactive_elements– Discover visible interactive elements on the pageget_accessibility_tree– Retrieve simplified accessibility tree for the page
These tools allow agents to understand the UI structure before interacting with it.
Notes
- Element detection is optimized for modern web applications (React, Angular, dynamic UI frameworks).
- Elements are identified using interaction signals such as roles, click handlers, and focusability.
- Only visible and meaningful elements are returned to reduce noise.
INTERACTION TOOLS
click_element– Click an element by indextype_into_element– Enter text into an input field
Elements must first be discovered using: get_interactive_elements
PAGE ANALYSIS
get_page_text – Extract visible text from the page
Useful for:
- validation
- reasoning
- information extraction
VISUAL DEBUGGING
take_screenshot – Capture a screenshot of the current browser window
Screenshot Storage Location
When screenshots are captured, they are automatically saved in a hidden folder inside your home directory.
macOS / Linux
Screenshots are stored at:
~/.selenium-mcp/screenshot
Example full path:
/Users/<your-username>/.selenium-mcp/screenshot
You can open the folder using Terminal:
open ~/.selenium-mcp/screenshot
Windows
Screenshots are stored at:
C:\Users\<your-username>\.selenium-mcp\screenshot
Example:
C:\Users\John\.selenium-mcp\screenshot
You can open it from File Explorer by entering the following in the address bar:
%USERPROFILE%\.selenium-mcp\screenshot
Custom Screenshot Directory (Optional)
You can override the default screenshot location using the environment variable: SELENIUM_MCP_SCREENSHOT_DIR
macOS / Linux
export SELENIUM_MCP_SCREENSHOT_DIR=~/my-screenshots
Windows (PowerShell)
$env:SELENIUM_MCP_SCREENSHOT_DIR="C:\my-screenshots"
All screenshots will then be saved to the specified directory.
Notes
- The folder is created automatically the first time a screenshot is taken.
- The
.selenium-mcpdirectory is hidden by default because it starts with a dot (.). - You can safely delete screenshots anytime.
BROWSER SESSION FLOW
Each browser session is identified by a session_id.
Typical workflow for agents:
- open_browser
- open_url
- wait_for_page
- get_interactive_elements
- (optional) get_tabs / switch_tab if multiple tabs are present
- click_element or type_into_element
MULTI-TAB WORKFLOW
Agents can work with multiple tabs within the same browser session.
Example workflow:
- open_browser
- open_url
- open_new_tab("https://example.com")
- get_tabs
- switch_tab(index)
- perform actions
- close_tab(index)
Notes
- Each tab is tracked using an internal index.
- The active tab is automatically managed and updated.
- All actions are performed on the currently active tab.
EXAMPLE AGENT WORKFLOW
Example task:
- Open Chrome browser.
- Navigate to Google.com
- Type the text "Selenium MCP" in the search box.
- Press the search button
Agent steps:
open_browser
open_url("https://google.com")
wait_for_page
get_interactive_elements
type_into_element(index, "Selenium MCP")
click_element(index)
wait_for_page
get_page_text
SYSTEM PROMPT FOR AI AGENTS
This repository includes a production-grade system prompt designed specifically for browser automation agents that interact with this Selenium MCP server.
The prompt contains detailed operational guidelines that instruct the AI agent on how to:
- initialize and control the browser
- discover and interact with UI elements
- analyze page structure using the accessibility tree
- avoid hallucinating element indexes
- handle navigation and page reloads
- recover from stale elements
- follow a deterministic execution loop (PLAN → ACT → OBSERVE → UPDATE PLAN)
- enforce safety limits on tool usage
Prompt location
prompts/system_prompt.md
How to use
Whenever you build an AI agent that interacts with this MCP server, this prompt should be provided as the system prompt for the model.
Why this prompt
Browser automation agents can easily make incorrect decisions if not guided properly. This system prompt provides strict operational rules and guardrails that help the agent:
- use MCP tools correctly
- avoid incorrect element interactions
- minimize hallucinations
- perform reliable b
