SkillAgentSearch skills...

Selenium MCP Server

Selenium WebDriver MCP server for AI agents and LLM-powered browser automation

Install / Use

/learn @nayakprashant/Selenium MCP Server

README

Selenium MCP Server

Python Selenium MCP Author

Model Context Protocol (MCP) server for Selenium WebDriver that enables AI agents and LLMs to control real browsers for automation

This project exposes Selenium WebDriver as an MCP (Model Context Protocol) server, allowing AI agents to control a real browser through structured tools.

It enables LLMs and autonomous agents to perform tasks like:

  • Opening browsers
  • Navigating websites
  • Discovering UI elements
  • Clicking buttons and links
  • Typing into inputs
  • Extracting page text
  • Taking screenshots
  • Many more future upgrades (in-progress)

This makes it possible to build AI-powered browser automation systems and autonomous QA agents.

Table of Contents

WHY THIS PROJECT EXISTS

Modern AI agents need a way to interact with real applications.

While traditional automation tools like Selenium exist, they are not directly usable by LLM agents.

This project bridges that gap by exposing Selenium functionality through MCP tools so that agents can:

  • Understand web pages
  • Discover UI elements
  • Perform actions
  • Validate results

ARCHITECTURE

flowchart TD
    A[LLM Agent] --> B[MCP Protocol]
    B --> C[Selenium MCP Server]

    C --> D[Browser Tools]
    C --> E[Navigation Tools]
    C --> F[Interaction Tools]
    C --> G[Element Tools]
    C --> H[Debug Tools]

    D --> I[Selenium WebDriver]
    E --> I
    F --> I
    G --> I
    H --> I
    
    I --> J[Browser]

FEATURES

  • MCP-compatible Selenium automation server
  • Browser session management
  • Navigation controls
  • UI element discovery
  • Accessibility-aware interaction
  • Screenshot capture
  • Page text extraction
  • Headless browser support
  • Multi-tab browser management (open, switch, close, track active tab)
  • Improved interactive element detection for modern UI frameworks (React, Angular, dynamic DOM)

INSTALLATION

Run the following command

pip install selenium-mcp

RUNNING THE SERVER

Start the MCP server

You can start the Selenium MCP server using different transport modes depending on your use case.

Default (STDIO)
selenium-mcp run
  • Uses stdio transport
  • Best for local agent integrations
  • No network exposure
HTTP Mode (Recommended)
selenium-mcp run --transport http --host 127.0.0.1 --port 3345

Starts server at: http://127.0.0.1:3345

MCP endpoint: http://127.0.0.1:3345/mcp

Best for:

  • API integrations
  • Postman / curl testing
  • production-style usage
SSE Mode (Streaming)
selenium-mcp run --transport sse --host 127.0.0.1 --port 3345

Starts server at: http://127.0.0.1:3345/sse

Best for:

  • streaming-based agents
  • real-time interactions

Note: Note: SSE endpoints are streaming and may not show output directly in the browser.

Expose Server on Network:
selenium-mcp run --transport http --host 0.0.0.0 --port 3345

Makes server accessible from:

  • other devices on the same network
  • Docker / VM environments
Notes:

Default port: 3336 Supported transports:

stdio (default)
http
sse

Ensure port is within range: 1–65535

MCP SERVER VERSION

To check the current version of the selenium MCP server, run the following command:

selenium-mcp version

AVAILABLE MCP TOOLS

Run the following command to get the list of tools supported by MCP server:

selenium-mcp tools 

This returns the list of tools supported by MCP server.

BROWSER CONTROL

  1. open_browser – Launch a new browser session
  2. close_browser – Close the browser session
  3. maximize_browser – Maximize browser window
  4. fullscreen_browser – Switch browser to fullscreen

NAVIGATION

  1. open_url – Navigate to a specific URL
  2. navigate_back – Navigate back in browser history
  3. navigate_forward – Navigate forward in history
  4. refresh_page – Reload the page
  5. wait_for_page – Wait for page to load
  6. get_page_title – Get the current page title

TAB MANAGEMENT

  1. get_tabs – Retrieve all open tabs in the current session
  2. switch_tab – Switch to a specific tab using index
  3. open_new_tab – Open a new tab and optionally navigate to a URL
  4. close_tab – Close a specific tab by index
  5. get_current_tab – Retrieve the currently active tab
  6. name_tab – Assign a custom name to a tab for easier identification

These tools allow agents to manage multiple tabs within a single browser session.

ELEMENT DISCOVERY

  1. get_interactive_elements – Discover visible interactive elements on the page
  2. get_accessibility_tree – Retrieve simplified accessibility tree for the page

These tools allow agents to understand the UI structure before interacting with it.

Notes

  • Element detection is optimized for modern web applications (React, Angular, dynamic UI frameworks).
  • Elements are identified using interaction signals such as roles, click handlers, and focusability.
  • Only visible and meaningful elements are returned to reduce noise.

INTERACTION TOOLS

  1. click_element – Click an element by index
  2. type_into_element – Enter text into an input field

Elements must first be discovered using: get_interactive_elements

PAGE ANALYSIS

get_page_text – Extract visible text from the page

Useful for:

  • validation
  • reasoning
  • information extraction

VISUAL DEBUGGING

take_screenshot – Capture a screenshot of the current browser window

Screenshot Storage Location

When screenshots are captured, they are automatically saved in a hidden folder inside your home directory.

macOS / Linux

Screenshots are stored at:

~/.selenium-mcp/screenshot

Example full path:

/Users/<your-username>/.selenium-mcp/screenshot

You can open the folder using Terminal:

open ~/.selenium-mcp/screenshot
Windows

Screenshots are stored at:

C:\Users\<your-username>\.selenium-mcp\screenshot

Example:

C:\Users\John\.selenium-mcp\screenshot

You can open it from File Explorer by entering the following in the address bar:

%USERPROFILE%\.selenium-mcp\screenshot

Custom Screenshot Directory (Optional)

You can override the default screenshot location using the environment variable: SELENIUM_MCP_SCREENSHOT_DIR

macOS / Linux
export SELENIUM_MCP_SCREENSHOT_DIR=~/my-screenshots
Windows (PowerShell)
$env:SELENIUM_MCP_SCREENSHOT_DIR="C:\my-screenshots"

All screenshots will then be saved to the specified directory.

Notes
  • The folder is created automatically the first time a screenshot is taken.
  • The .selenium-mcp directory is hidden by default because it starts with a dot (.).
  • You can safely delete screenshots anytime.

BROWSER SESSION FLOW

Each browser session is identified by a session_id.

Typical workflow for agents:

  1. open_browser
  2. open_url
  3. wait_for_page
  4. get_interactive_elements
  5. (optional) get_tabs / switch_tab if multiple tabs are present
  6. click_element or type_into_element

MULTI-TAB WORKFLOW

Agents can work with multiple tabs within the same browser session.

Example workflow:

  1. open_browser
  2. open_url
  3. open_new_tab("https://example.com")
  4. get_tabs
  5. switch_tab(index)
  6. perform actions
  7. close_tab(index)

Notes

  • Each tab is tracked using an internal index.
  • The active tab is automatically managed and updated.
  • All actions are performed on the currently active tab.

EXAMPLE AGENT WORKFLOW

Example task:

  1. Open Chrome browser.
  2. Navigate to Google.com
  3. Type the text "Selenium MCP" in the search box.
  4. Press the search button

Agent steps:

open_browser
open_url("https://google.com")
wait_for_page
get_interactive_elements
type_into_element(index, "Selenium MCP")
click_element(index)
wait_for_page
get_page_text

SYSTEM PROMPT FOR AI AGENTS

This repository includes a production-grade system prompt designed specifically for browser automation agents that interact with this Selenium MCP server.

The prompt contains detailed operational guidelines that instruct the AI agent on how to:

  • initialize and control the browser
  • discover and interact with UI elements
  • analyze page structure using the accessibility tree
  • avoid hallucinating element indexes
  • handle navigation and page reloads
  • recover from stale elements
  • follow a deterministic execution loop (PLAN → ACT → OBSERVE → UPDATE PLAN)
  • enforce safety limits on tool usage

Prompt location

prompts/system_prompt.md

How to use

Whenever you build an AI agent that interacts with this MCP server, this prompt should be provided as the system prompt for the model.

Why this prompt

Browser automation agents can easily make incorrect decisions if not guided properly. This system prompt provides strict operational rules and guardrails that help the agent:

  • use MCP tools correctly
  • avoid incorrect element interactions
  • minimize hallucinations
  • perform reliable b
View on GitHub
GitHub Stars3
CategoryDevelopment
Updated15h ago
Forks0

Languages

Python

Security Score

90/100

Audited on Mar 21, 2026

No findings