BrowserGym banner

🛠️ Setup - 🏋 Usage - 💻 Demo - 🌐 Ecosystem - 🚀 AgentLab - 🌟 Contributors - 📄 Paper - 📝 Citation

pip install browsergym

</div>

[!WARNING] BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research. It is not meant to be a consumer product. Use with caution!

[!TIP] 🚀 Check out AgentLab✨ ! A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.

https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85

Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row).

BrowserGym includes the following benchmarks by default:

Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the AbstractBrowserTask class.

🛠️ Setup

To use browsergym, install one of the following packages:

pip install browsergym  # (recommended) everything below
pip install browsergym-experiments  # experiment utilities (agent, loop, benchmarks) + everything below
pip install browsergym-core  # core functionalities only (no benchmark, just the openended task)
pip install browsergym-miniwob  # core + miniwob
pip install browsergym-webarena  # core + webarena
pip install browsergym-webarena-verified  # core + webarena_verified
pip install browsergym-visualwebarena  # core + visualwebarena
pip install browsergym-workarena  # core + workarena
pip install browsergym-assistantbench  # core + assistantbench
pip install weblinx-browsergym  # core + weblinx
pip install browsergym-timewarp  # core + timewarp

Then setup playwright by running

playwright install chromium

Finally, each benchmark comes with its own specific setup that requires to follow additional steps.

for MiniWoB++, see miniwob/README.md
for WebArena, see webarena/README.md
for WebArenaVerified, see webarena_verified/README.md
for VisualWebArena, see visualwebarena/README.md
for WorkArena, see WorkArena
for AssistantBench, see assistantbench/README.md
for OpenApps, see OpenApps docs
for TimeWarp, see timewarp/README.md

🏗️ Development setup

To install browsergym locally for development, use the following commands:

git clone git@github.com:ServiceNow/BrowserGym.git
cd BrowserGym
make install

Contributions are welcome! 😊

🏋 Usage

Boilerplate code to run an agent on an interactive, open-ended task:

import gymnasium as gym
import browsergym.core  # register the openended task as a gym environment

# start an openended environment
env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.google.com/"},  # starting URL
    wait_for_user_message=True,  # wait for a user message after each agent message sent to the chat
)
# run the environment <> agent loop until termination
obs, info = env.reset()
while True:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        break
# release the environment
env.close()

MiniWoB

import gymnasium as gym
import browsergym.miniwob  # register miniwob tasks as gym environments

# start a miniwob task
env = gym.make("browsergym/miniwob.choose-list")
...

# list all the available miniwob tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
print("\n".join(env_ids))

WorkArena

import gymnasium as gym
import browsergym.workarena  # register workarena tasks as gym environments

# start a workarena task
env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
...

# list all the available workarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

WebArena

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

# start a webarena task
env = gym.make("browsergym/webarena.310")
...

# list all the available webarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
print("\n".join(env_ids))

VisualWebArena

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

# start a visualwebarena task
env = gym.make("browsergym/visualwebarena.721")
...

# list all the available visualwebarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]
print("\n".join(env_ids))

AssistantBench

import gymnasium as gym
import browsergym.workarena  # register assistantbench tasks as gym environments

# start an assistantbench task
env = gym.make("browsergym/assistantbench.validation.3")
...

# list all the available assistantbench tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

OpenApps

from open_apps.apps.start_page.main import app  # need to import apps to serve
from open_apps.launcher import OpenAppsLauncher

config = ... # configure a namespace with task, agent, envrionment, and server configs

launcher = OpenAppsLauncher(config)
launcher.launch()

TimeWarp

import gymnasium as gym
import browsergym.timewarp  # register timewarp tasks as gym environments

# start a timewarp task
env = gym.make("browsergym/timewarp.1")
...

# list all the available timewarp tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/timewarp")]
print("\n".join(env_ids))

💻 Demo

If you want to experiment with a demo agent in BrowserGym, follow these steps

# conda setup
conda env create -f demo_agent/environment.yml
conda activate demo_agent

# or pip setup
pip install -r demo_agent/requirements.txt

# then download the browser for playwright
playwright install chromium

Our demo agent uses openai as a backend, be sure to set your OPENAI_API_KEY.

Launch the demo agent as follows

# openended (interactive chat mode)
python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com

# miniwob
python demo_agent/run_demo.py --task_name miniwob.click-test

# workarena
python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop

# webarena
python demo_agent/run_demo.py --task_name webarena.4

# visualwebarena
python demo_agent/run_demo.py --task_name visualwebarena.398

You can customize your experience by changing the model_name to your preferred LLM (it uses gpt-4o-mini by default), adding screenshots for your VLMs with use_screenshot, and much more!

python demo_agent/run_demo.py --help

🌐 Ecosystem

AgentLab: Seamlessly run agents on benchmarks, collect and analyse traces.
WorkArena(++): A benchmark for web agents on the ServiceNow platform.
WebArena: A benchmark of realistic web tasks on self-hosted domains.
VisualWebArena: A benchmark of realistic visual web tasks on self-hosted domains.
MiniWoB(++): A collection of over 100 web tasks on synthetic web pages.
WebLINX: A dataset of real-world web interaction traces.
AssistantBench: A benchmark of realistic and time-consuming tasks on the open web.
DoomArena: A framework for AI agent security testing which supports injecting attacks into web pages from Browsergym environments.

🌟 Contributors

[![BrowserGym contributors](https:

BrowserGym

Install / Use