SkillAgentSearch skills...

BenchMARL

BenchMARL is a library for benchmarking Multi-Agent Reinforcement Learning (MARL). BenchMARL allows to quickly compare different MARL algorithms, tasks, and models while being systematically grounded in its two core tenets: reproducibility and standardization.

Install / Use

/learn @facebookresearch/BenchMARL

README

BenchMARL

BenchMARL

tests Documentation Status Python <a href="https://pypi.org/project/benchmarl"><img src="https://img.shields.io/pypi/v/benchmarl" alt="pypi version"></a> Downloads Discord Shield arXiv

python benchmarl/run.py algorithm=mappo task=vmas/balance

Examples Open In Colab Static Badge

What is BenchMARL 🧐?

BenchMARL is a Multi-Agent Reinforcement Learning (MARL) training library created to enable reproducibility and benchmarking across different MARL algorithms and environments. Its mission is to present a standardized interface that allows easy integration of new algorithms and environments to provide a fair comparison with existing solutions. BenchMARL uses TorchRL as its backend, which grants it high performance and state-of-the-art implementations. It also uses hydra for flexible and modular configuration, and its data reporting is compatible with marl-eval for standardised and statistically strong evaluations.

BenchMARL core design tenets are:

  • Reproducibility through systematical grounding and standardization of configuration
  • Standardised and statistically-strong plotting and reporting
  • Experiments that are independent of the algorithm, environment, and model choices
  • Breadth over the MARL ecosystem
  • Easy implementation of new algorithms, environments, and models
  • Leveraging the know-how and infrastructure of TorchRL, without reinventing the wheel

Why would I BenchMARL 🤔?

Why would you BenchMARL, I see you ask. Well, you can BenchMARL to compare different algorithms, environments, models, to check how your new research compares to existing ones, or if you just want to approach the domain and want to easily take a picture of the landscape.

Table of contents

How to use

Notebooks

  • Open In ColabRunning BenchMARL experiments.
  • Open In ColabCreating a VMAS scenario and training it in BenchMARL. We will create a scenario where multiple robots with different embodiments need to navigate to their goals while avoiding each other (as well as obstacles) and train it using MAPPO and MLP/GNN policies.

Install

Install TorchRL

You can install TorchRL from PyPi.

pip install torchrl

For more details, or for installing nightly versions, see the TorchRL installation guide.

Install BenchMARL

You can just install it from github

pip install benchmarl

Or also clone it locally to access the configs and scripts

git clone https://github.com/facebookresearch/BenchMARL.git
pip install -e BenchMARL

Install environments

All enviornment dependencies are optional in BenchMARL and can be installed separately.

VMAS
pip install vmas
PettingZoo
pip install "pettingzoo[all]"
MeltingPot
pip install dm-meltingpot
MAgent2
pip install git+https://github.com/Farama-Foundation/MAgent2
SMACv2

Follow the instructions on the environment repository.

Here is how we install it on linux.

Run

Experiments are launched with a default configuration that can be overridden in many ways. To learn how to customize and override configurations please refer to the configuring section.

Command line

To launch an experiment from the command line you can do

python benchmarl/run.py algorithm=mappo task=vmas/balance

Example

Thanks to hydra, you can run benchmarks as multi-runs like:

python benchmarl/run.py -m algorithm=mappo,qmix,masac task=vmas/balance,vmas/sampling seed=0,1

Example

The default implementation for hydra multi-runs is sequential, but parallel and slurm launchers are also available.

Script

You can also load and launch your experiments from within a script

 experiment = Experiment(
    task=VmasTask.BALANCE.get_from_yaml(),
    algorithm_config=MappoConfig.get_from_yaml(),
    model_config=MlpConfig.get_from_yaml(),
    critic_model_config=MlpConfig.get_from_yaml(),
    seed=0,
    config=ExperimentConfig.get_from_yaml(),
)
experiment.run()

Example

You can also run multiple experiments in a Benchmark.

benchmark = Benchmark(
    algorithm_configs=[
        MappoConfig.get_from_yaml(),
        QmixConfig.get_from_yaml(),
        MasacConfig.get_from_yaml(),
    ],
    tasks=[
        VmasTask.BALANCE.get_from_yaml(),
        VmasTask.SAMPLING.get_from_yaml(),
    ],
    seeds={0, 1},
    experiment_config=ExperimentConfig.get_from_yaml(),
    model_config=MlpConfig.get_from_yaml(),
    critic_model_config=MlpConfig.get_from_yaml(),
)
benchmark.run_sequential()

Example

Concept

The goal of BenchMARL is to bring different MARL environments and algorithms under the same interfaces to enable fair and reproducible comparison and benchmarking. BenchMARL is a full-pipline unified training library with the goal of enabling users to run any comparison they want across our algorithms and tasks in just one line of code. To achieve this, BenchMARL interconnects components from TorchRL, which provides an efficient and reliable backend.

The library has a default configuration for each of its components. While parts of this configuration are supposed to be changed (for example experiment configurations), other parts (such as tasks) should not be changed to allow for reproducibility. To aid in this, each version of BenchMARL is paired to a default configuration.

Let's now introduce each component in the library.

Experiment. An experiment is a training run in which an algorithm, a task, and a model are fixed. Experiments are configured by passing these values alongside a seed and the experiment hyperparameters. The experiment hyperparameters cover both on-policy and off-policy algorithms, discrete and continuous actions, and probabilistic and deterministic policies (as they are agnostic of the algorithm or task used). An experiment can be launched from the command line or from a script. See the run section for more information.

Benchmark. In the library we call benchmark a collection of experiments that can vary in tasks, algorithm, or model. A benchmark shares the same experiment configuration across all of its experiments. Benchmarks allow to compare different MARL components in a standardized way. A benchmark can be launched from the command line or from a script. See the run section for more information.

Algorithms. Algorithms are an ensemble of components (e.g., losss, replay buffer) which determine the training strategy. Here is a table with the currently implemented al

View on GitHub
GitHub Stars588
CategoryEducation
Updated2d ago
Forks119

Languages

Python

Security Score

100/100

Audited on Mar 21, 2026

No findings