Appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
Install / Use
/learn @StonyBrookNLP/AppworldREADME
<h2 align="center">
<a href="https://appworld.dev/"><img src="/images/banner-light.svg" alt="AppWorld Banner" width="100%"></a>
</h2>
<div>
<h3 align="center">A Controllable World of Apps and People for <br> Benchmarking Function Calling & Interactive Coding Agents <br><br>🏆 ACL'24 Best Resource Paper 🏆</h3>
<p align="center">
:link: <a href="https://appworld.dev/">Website</a>
:earth_americas: <a href="https://appworld.dev/task-explorer">Task Explorer</a>
:hammer_and_wrench: <a href="https://appworld.dev/api-explorer">API Explorer</a>
:bar_chart: <a href="https://appworld.dev/leaderboard">Leaderboard</a>
<br>
:movie_camera: <a href="https://appworld.dev/video">Videos</a>
:bird: <a href="https://x.com/harsh3vedi/status/1818311843976233198">Tweet</a>
:speech_balloon: <a href="https://towardsdatascience.com/appworld-a-controllable-world-of-apps-and-people-for-benchmarking-interactive-coding-agents-37517dd9d498">Blog</a>
:page_facing_up: <a href="https://arxiv.org/abs/2407.18901">Paper</a>
</p>
</div>
<h2 align="center">
<a href="https://github.com/StonyBrookNLP/appworld/actions/workflows/ci.yml">
<img src="https://github.com/StonyBrookNLP/appworld/actions/workflows/ci.yml/badge.svg" alt="AppWorld CI">
</a>
<a href="https://www.python.org/">
<img alt="Build" src="https://img.shields.io/badge/Python-3.11+-1f425f.svg?color=purple", alt="Python version"
</a>
<a href="https://github.com/charliermarsh/ruff">
<img alt="Build" src="https://img.shields.io/badge/linter-ruff-green" alt="Ruff">
</a>
<a href="https://mypy-lang.org">
<img alt="Build" src="https://img.shields.io/badge/type%20checked-mypy-039dfc" alt="MyPy">
</a>
<a href="https://opensource.org/licenses/Apache-2.0">
<img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="Apache 2.0 License">
</a>
<a href="https://pypi.org/project/appworld/">
<img src="https://badge.fury.io/py/appworld.svg" alt="PyPI Version">
</a>
<a href="https://pepy.tech/projects/appworld">
<img src="https://img.shields.io/pepy/dt/appworld" alt="PyPI Downloads">
</a>
<a href="http://github.com/stonybrooknlp/appworld">
<img src="https://img.shields.io/badge/PRs-welcome-black.svg" alt="PRs welcome">
</a>
<a href="https://github.com/stonybrooknlp/appworld#electric_plug-introducing-appworld-mcp-server-and-client">
<img src="https://badge.mcpx.dev?type=server" title="MCP Server"/>
</a>
<a href="https://github.com/stonybrooknlp/appworld#electric_plug-introducing-appworld-mcp-server-and-client">
<img src="https://badge.mcpx.dev?type=client" title="MCP Client"/>
</a>
</h2>
- :information_source: About
- :racing_car: TLDR
- :floppy_disk: Installation
- :rotating_light: Release Disclaimer
- :earth_africa: AppWorld Walkthrough
- :wave: Primer
- :globe_with_meridians: Task Worlds
- :repeat: Interactive Coding and API Calls
- :runner: A Minimal Agent in Action
- :bar_chart: Evaluating the Agent
- :no_entry_sign: Agent Development Restrictions
- :lock: Code Execution Safety
- :satellite: Serving AppWorld Environment/APIs with/out Docker
- :electric_plug: Introducing AppWorld MCP Server and Client (⭐ NEW ⭐)
- :link: Starting MCP Server
- :robot: Connecting MCP Client
- :desktop_computer: Connecting via an MCP GUI Client
- :gear: Connecting via an MCP-supported Agent Framework
- :artificial_satellite: Connecting via AppWorld's In-built MCP Client
- :feather: Connecting via our Standalone MCP Client
- :test_tube: Agent Experiments
- :floppy_disk: Installation
- :mag_right: Experiment Options (⭐ NEW ⭐)
- :robot: Running and Evaluating Agents
- :inbox_tray: Downloading our Experiment Outputs
- :mechanic: Adding your Base LLM (⭐ NEW ⭐)
- :control_knobs: Customizing Agents (⭐ NEW ⭐)
- :books: Guides (⭐ NEW ⭐)
- :magic_wand: Generating Base Database and Tasks
- :bulb: Developing New Apps
- :cyclone: Developing New Task Generators
- :computer: Evaluating Terminal Agents (Codex, Gemini, etc.) via AppWorld MCP
- :zap: Parallelizing Worlds (RL rollouts, or faster runs)
- :trophy: Leaderboard
- :lock_with_ink_pen: License
- :octopus: Contribution
- :page_facing_up: Citation
:information_source: About
This work introduces AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of ~100 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging autonomous agent tasks requiring rich and interactive coding.
<div align="center"> <img src="/images/overview-engine-benchmark.jpg" width="100%"> </div>:racing_car: TLDR
In a rush? These code snippets show most of how to use AppWorld. See this notebook for a minimal working ReAct agent and the other sections for details.
# Install and download:
pip install appworld
appworld install
appworld download data
# Solve tasks via your own agent code:
from appworld import AppWorld, load_task_ids
for task_id in load_task_ids("test_challenge"): # Or train, dev, test_normal
with AppWorld(task_id=task_id, experiment_name="sample") as world:
world.task.instruction # To see task instruction.
world.execute("""
# ...
response = apis.spotify.login(...)
print(response)
""")
# => {"access_token": ...}
# ...
# can reuse past variables, or use previously printed information.
world.execute("""
library = spotify.show_playlist_library(
access_token=response["access_token"]
)""")
# ...
# indicate task completion:
world.execute("apis.supervisor.complete_task()")
# world.evaluate().report() # optional
# Or solve them via our tool-use, mcp, coding agent implementations:
# Install appworld-agents and run any of 5+ agents with 100+ models.
pip install -e 'experiments[simplified]' # from repo or 'pip install appworld-agents' once published.
appworld run auto --agent-name {AGENT_NAME} --model-name {MODEL_NAME} --dataset-name test_challenge
# Replace any/all {...} with "options" to see choices.
# Evaluate:
appworld evaluate sample test_challenge
# experiment name ^ ^ dataset name
# Explore AppWorld CLI for many more possibilities via --help.
# E.g., `appworld explore` to explore the dataset and
# 'appworld play' for an interactive playground. More 👇
appworld --help
# Usage: appworld [OPTIONS] COMMAND [ARGS]...
# AppWorld Command Line Interface 🚀
# ╭─ Commands ───────────────────────────────────────────────────────────────────────────────╮
# │ install [ Setup ] Unpack encrypted portion of the AppWorld code. │
# │ download [ Setup ] Download AppWorld data or baseline agents' experiment outputs. │
# │ verify [ Setup ] Verify AppWorld installation. │
# │ explore [Development] Explore task instructions in the AppWorld dataset. │
# │ serve [Development] Serve AppWorld Environment, APIs or MCP with or without Docker. │
# │ play [Development] Start an interactive coding playground to explore task worlds. │
# │ run [Development] Run AppWorld agent from the appworld-agents library. │
# │ evaluate [Development] Run experiment evaluation. │
# │ pack [Leaderboard] Pack your experiment output for the leaderboard submission. │
# │ unpack [Leaderboard] Unpack an experiment output from the leaderboard submission. │
# │ make [Leaderboard] Make a leaderboard entry from your agent's experiment outputs. │
# ╰─────────────────────────────────────────────────────────────────────────────────────-────╯
:floppy_disk: Installation
Install the appworld package in your Python 3.11+ environment.
conda create -n appworld python=3.11.0 -y && conda activate appworld
<hr/>
</details>
pip install appworld # installs appworld into your site-packages directory
appworld install # unpacks encrypted code in your site-packages directory
<details><summary> ::Click:: Install from source</summary>
