Mava
๐ฆ A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
Install / Use
/learn @instadeepai/MavaREADME
Welcome to Mava! ๐ฆ
<div align="center"> <h3>Installation | Getting started
</div>Mava allows researchers to experiment with multi-agent reinforcement learning (MARL) at lightning speed. The single-file JAX implementations are built for rapid research iteration - hack, modify, and test new ideas fast. Our [state-of-the-art algorithms][sable] scale seamlessly across devices. Created for researchers, by The Research Team at InstaDeep.
Highlights ๐ฆ
- ๐ฅ Implementations of MARL algorithms: Implementations of current state-of-the-art MARL algorithms that are distributed and effectively make use of available accelerators.
- ๐ฌ Environment Wrappers: We provide first class support to a few JAX based MARL environment suites through the use of wrappers, however new environments can be easily added by using existing wrappers as a guide.
- ๐งช Statistically robust evaluation: Mava natively supports logging to json files which adhere to the standard suggested by [Gorsane et al. (2022)][toward_standard_eval]. This enables easy downstream experiment plotting and aggregation using the tools found in the [MARL-eval][marl_eval] library.
- ๐ฅ๏ธ JAX Distrubution Architectures for Reinforcement Learning: Mava supports both [Podracer][anakin_paper] architectures for scaling RL systems. The first of these is Anakin, which can be used when environments are written in JAX. This enables end-to-end JIT compilation of the full MARL training loop for fast experiment run times on hardware accelerators. The second is Sebulba, which can be used when environments are not written in JAX. Sebulba is particularly useful when running RL experiments where a hardware accelerator can interact with many CPU cores at a time.
- โก Blazingly fast experiments: All of the above allow for very quick runtime for our experiments, especially when compared to other non-JAX based MARL libraries.
Installation ๐ฌ
At the moment Mava is not meant to be installed as a library, but rather to be used as a research tool. We recommend cloning the Mava repo and installing dependencies using uv as follows:
# Clone the repository
git clone https://github.com/instadeepai/Mava.git
cd Mava
# Create a virtual environment and install all dependencies
uv sync
# Activate the virtual environment
source .venv/bin/activate
To install Mava with a GPU or TPU aware version of JAX
uv sync --extra cuda12 # GPU aware JAX
uv sync --extra tpu # TPU aware JAX
Alternatively with pip, create a virtual environment and then:
pip install -e ".[cuda12]" # GPU aware JAX (leave out the [cuda12] if you don't have a GPU or are on Mac)
We have tested Mava on Python 3.11 and 3.12, but earlier versions may also work. Specifically, we use Python 3.10 for the Quickstart notebook on Google Colab since Colab uses Python 3.10 by default. For more in-depth installation guides including Docker builds and virtual environments, please see our detailed installation guide.
Getting started โก
To get started with training your first Mava system, simply run one of the system files:
python mava/systems/ppo/anakin/ff_ippo.py
Mava makes use of Hydra for config management. In order to see our default system configs please see the mava/configs/ directory. A benefit of Hydra is that configs can either be set in config yaml files or overwritten from the terminal on the fly. For an example of running a system on the Level-based Foraging environment, the above code can simply be adapted as follows:
python mava/systems/ppo/anakin/ff_ippo.py env=lbf
Different scenarios can also be run by making the following config updates from the terminal:
python mava/systems/ppo/anakin/ff_ippo.py env=rware env/scenario=tiny-4ag
Additionally, we also have a [Quickstart notebook][quickstart] that can be used to quickly create and train your first multi-agent system.
<h2>Algorithms</h2>Mava has implementations of multiple on- and off-policy multi-agent algorithms that follow the independent learners (IL), centralised training with decentralised execution (CTDE) and heterogeneous agent learning paradigms. Aside from MARL learning paradigms, we also include implementations which follow the Anakin and Sebulba architectures to enable scalable training by default. The architecture that is relevant for a given problem depends on whether the environment being used in written in JAX or not. For more information on these paradigms, please see [here][anakin_paper].
| Algorithm | Variants | Continuous | Discrete | Anakin | Sebulba | Paper | Docs |
|------------|----------------|------------|----------|--------|---------|-------|------|
| PPO | ff_ippo.py | โ
| โ
| โ
| โ
| Link | Link |
| | ff_mappo.py | โ
| โ
| โ
| | Link | Link |
| | rec_ippo.py | โ
| โ
| โ
| | Link | Link |
| | rec_mappo.py | โ
| โ
| โ
| | Link | Link |
| Q Learning | rec_iql.py | | โ
| โ
| | Link | Link |
| | rec_qmix.py | | โ
| โ
| | Link | Link |
| SAC | ff_isac.py | โ
| | โ
| | Link | Link |
| | ff_masac.py | โ
| | โ
| | | Link |
| | ff_hasac.py | โ
| | โ
| | Link | Link |
| MAT | mat.py | โ
| โ
| โ
| | Link | Link |
| Sable | ff_sable.py | โ
| โ
| โ
| | Link | Link |
| | rec_sable.py | โ
| โ
| โ
| | Link | Link |
| GPO | rec_magpo.py | โ
| โ
| โ
| | Link | Link |
These are the environments which Mava supports out of the box, to add a new environment, please use the existing wrapper implementations as an example. We also indicate whether the environment is implemented in JAX or not. JAX-based environments can be used with algorithms that follow the Anakin distribution architecture, while non-JAX environments can be used with algorithms following the Sebulba architecture.
| Environment | Action space | JAX | Non-JAX | Paper | JAX Source | Non-JAX Source | |---------------------------------|---------------------|-----|-------|-------|------------|----------------| | Mulit-Robot Warehouse | Discrete | โ | โ | Link | Link | Link | | Level-based Foraging | Discrete | โ | โ | Link | Link | Link | | StarCraft Multi-Agent Challenge | Discrete | โ | โ | [Link](https://arxiv.org/abs/1902.
