<h2 align="center"> <a href="https://appworld.dev/"><img src="/images/banner-light.svg" alt="AppWorld Banner" width="100%"></a> </h2> <div> <h3 align="center">A Controllable World of Apps and People for <br> Benchmarking Function Calling & Interactive Coding Agents <br><br>🏆 ACL'24 Best Resource Paper 🏆</h3> <p align="center"> :link: <a href="https://appworld.dev/">Website</a>    :earth_americas: <a href="https://appworld.dev/task-explorer">Task Explorer</a>    :hammer_and_wrench: <a href="https://appworld.dev/api-explorer">API Explorer</a>    :bar_chart: <a href="https://appworld.dev/leaderboard">Leaderboard</a> <br> :movie_camera: <a href="https://appworld.dev/video">Videos</a>    :bird: <a href="https://x.com/harsh3vedi/status/1818311843976233198">Tweet</a>    :speech_balloon: <a href="https://towardsdatascience.com/appworld-a-controllable-world-of-apps-and-people-for-benchmarking-interactive-coding-agents-37517dd9d498">Blog</a>    :page_facing_up: <a href="https://arxiv.org/abs/2407.18901">Paper</a>    </p> </div> <h2 align="center"> <a href="https://github.com/StonyBrookNLP/appworld/actions/workflows/ci.yml"> <img src="https://github.com/StonyBrookNLP/appworld/actions/workflows/ci.yml/badge.svg" alt="AppWorld CI"> </a> <a href="https://www.python.org/"> <img alt="Build" src="https://img.shields.io/badge/Python-3.11+-1f425f.svg?color=purple", alt="Python version" </a> <a href="https://github.com/charliermarsh/ruff"> <img alt="Build" src="https://img.shields.io/badge/linter-ruff-green" alt="Ruff"> </a> <a href="https://mypy-lang.org"> <img alt="Build" src="https://img.shields.io/badge/type%20checked-mypy-039dfc" alt="MyPy"> </a> <a href="https://opensource.org/licenses/Apache-2.0"> <img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="Apache 2.0 License"> </a> <a href="https://pypi.org/project/appworld/"> <img src="https://badge.fury.io/py/appworld.svg" alt="PyPI Version"> </a> <a href="https://pepy.tech/projects/appworld"> <img src="https://img.shields.io/pepy/dt/appworld" alt="PyPI Downloads"> </a> <a href="http://github.com/stonybrooknlp/appworld"> <img src="https://img.shields.io/badge/PRs-welcome-black.svg" alt="PRs welcome"> </a> <a href="https://github.com/stonybrooknlp/appworld#electric_plug-introducing-appworld-mcp-server-and-client"> <img src="https://badge.mcpx.dev?type=server" title="MCP Server"/> </a> <a href="https://github.com/stonybrooknlp/appworld#electric_plug-introducing-appworld-mcp-server-and-client"> <img src="https://badge.mcpx.dev?type=client" title="MCP Client"/> </a> </h2>

:information_source: About
:racing_car: TLDR
:floppy_disk: Installation
:rotating_light: Release Disclaimer
:earth_africa: AppWorld Walkthrough
- :wave: Primer
- :globe_with_meridians: Task Worlds
- :repeat: Interactive Coding and API Calls
- :runner: A Minimal Agent in Action
- :bar_chart: Evaluating the Agent
- :no_entry_sign: Agent Development Restrictions
- :lock: Code Execution Safety
- :satellite: Serving AppWorld Environment/APIs with/out Docker
:electric_plug: Introducing AppWorld MCP Server and Client (⭐ NEW ⭐)
- :link: Starting MCP Server
- :robot: Connecting MCP Client
  - :desktop_computer: Connecting via an MCP GUI Client
  - :gear: Connecting via an MCP-supported Agent Framework
  - :artificial_satellite: Connecting via AppWorld's In-built MCP Client
  - :feather: Connecting via our Standalone MCP Client
:test_tube: Agent Experiments
- :floppy_disk: Installation
- :mag_right: Experiment Options (⭐ NEW ⭐)
- :robot: Running and Evaluating Agents
- :inbox_tray: Downloading our Experiment Outputs
- :mechanic: Adding your Base LLM (⭐ NEW ⭐)
- :control_knobs: Customizing Agents (⭐ NEW ⭐)
:books: Guides (⭐ NEW ⭐)
- :magic_wand: Generating Base Database and Tasks
- :bulb: Developing New Apps
- :cyclone: Developing New Task Generators
- :computer: Evaluating Terminal Agents (Codex, Gemini, etc.) via AppWorld MCP
- :zap: Parallelizing Worlds (RL rollouts, or faster runs)
:trophy: Leaderboard
:lock_with_ink_pen: License
:octopus: Contribution
:page_facing_up: Citation

:information_source: About

This work introduces AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of ~100 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging autonomous agent tasks requiring rich and interactive coding.

:racing_car: TLDR

In a rush? These code snippets show most of how to use AppWorld. See this notebook for a minimal working ReAct agent and the other sections for details.

# Install and download:
pip install appworld
appworld install
appworld download data

# Solve tasks via your own agent code:
from appworld import AppWorld, load_task_ids

for task_id in load_task_ids("test_challenge"): # Or train, dev, test_normal
    with AppWorld(task_id=task_id, experiment_name="sample") as world:
        world.task.instruction # To see task instruction.
        world.execute("""
        # ...
        response = apis.spotify.login(...)
        print(response)
        """)
        # => {"access_token": ...}
        # ...
        # can reuse past variables, or use previously printed information.
        world.execute("""
        library = spotify.show_playlist_library(
            access_token=response["access_token"]
        )""")
        # ...
        # indicate task completion:
        world.execute("apis.supervisor.complete_task()")
        # world.evaluate().report()  # optional

# Or solve them via our tool-use, mcp, coding agent implementations:
# Install appworld-agents and run any of 5+ agents with 100+ models.
pip install -e 'experiments[simplified]'  # from repo or 'pip install appworld-agents' once published.
appworld run auto --agent-name {AGENT_NAME} --model-name {MODEL_NAME} --dataset-name test_challenge
# Replace any/all {...} with "options" to see choices.

# Evaluate:
appworld evaluate sample test_challenge
#    experiment name ^     ^ dataset name

# Explore AppWorld CLI for many more possibilities via --help.
# E.g., `appworld explore` to explore the dataset and
# 'appworld play' for an interactive playground. More 👇

appworld --help
#  Usage: appworld [OPTIONS] COMMAND [ARGS]...
#  AppWorld Command Line Interface 🚀
# ╭─ Commands ───────────────────────────────────────────────────────────────────────────────╮
# │ install    [   Setup   ] Unpack encrypted portion of the AppWorld code.                  │
# │ download   [   Setup   ] Download AppWorld data or baseline agents' experiment outputs.  │
# │ verify     [   Setup   ] Verify AppWorld installation.                                   │
# │ explore    [Development] Explore task instructions in the AppWorld dataset.              │
# │ serve      [Development] Serve AppWorld Environment, APIs or MCP with or without Docker. │
# │ play       [Development] Start an interactive coding playground to explore task worlds.  │
# │ run        [Development] Run AppWorld agent from the appworld-agents library.            │
# │ evaluate   [Development] Run experiment evaluation.                                      │
# │ pack       [Leaderboard] Pack your experiment output for the leaderboard submission.     │
# │ unpack     [Leaderboard] Unpack an experiment output from the leaderboard submission.    │
# │ make       [Leaderboard] Make a leaderboard entry from your agent's experiment outputs.  │
# ╰─────────────────────────────────────────────────────────────────────────────────────-────╯

:floppy_disk: Installation

Install the appworld package in your Python 3.11+ environment.

<details><summary> ::Click:: Example environment setup with <code>conda</code> </summary> <hr/>

conda create -n appworld python=3.11.0 -y && conda activate appworld

pip install appworld  # installs appworld into your site-packages directory
appworld install  # unpacks encrypted code in your site-packages directory

<details><summary> ::Click:: Install from source</summary>

Appworld

Install / Use

README

:information_source: About

:racing_car: TLDR

:floppy_disk: Installation