<h1 align="center"> <a href="https://github.com/EMI-Group/evox"> <picture> <source media="(prefers-color-scheme: dark)" srcset="docs/_static/evox_logo_dark.svg"> <source media="(prefers-color-scheme: light)" srcset="docs/_static/evox_logo_light.svg"> <img alt="EvoX Logo" height="50" src="docs/_static/evox_logo_light.svg"> </picture> </a> </h1> <p align="center"> <img src="https://github.com/google/brax/raw/main/docs/img/humanoid_v2.gif", width=160, height=160/> <img src="https://github.com/kenjyoung/MinAtar/raw/master/img/breakout.gif", width=160, height=160> <img src="https://raw.githubusercontent.com/instadeepai/jumanji/main/docs/env_anim/bin_pack.gif", width=160, height=160> </p> <h2 align="center"> <p>🌟 EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning 🌟</p> <a href="https://arxiv.org/abs/2501.15129"> <img src="https://img.shields.io/badge/paper-arxiv-red?style=for-the-badge" alt="EvoRL Paper on arXiv"> </a> </h2>

Table of Contents
Introduction
Installation
Quickstart
Algorithms
RL Environments
- Current Supported Environments
Performance
Bug report & Discussion
Acknowledgement
- Citing EvoRL

Introduction

EvoRL is a fully GPU-accelerated framework for Evolutionary Reinforcement Learning, which is implemented by JAX and provides end-to-end GPU-accelerated training pipelines, including following processes:

Reinforcement Learning (RL)
Evolutionary Computation (EC)
Environment Simulation

EvoRL provides a highly efficient and user-friendly platform to develop and evaluate RL, EC and EvoRL algorithms.

[!NOTE] EvoRL is a sister project of EvoX.

Highlight

End-to-end training pipelines: The training pipelines for RL, EC and EvoRL are entirely executed on GPUs, eliminating dense communication between CPUs and GPUs in traditional implementations and fully utilizing the parallel computing capabilities of modern GPU architectures.
- Most algorithms has a Workflow.step() function that is capable of jax.jit and jax.vmap(), supporting parallel training and JIT on full computation graph.
Easy integration between EC and RL: Due to modular design, EC components can be easily plug-and-play in workflows and cooperate with RL.
Implementation of EvoRL algorithms: Currently, we provide two popular paradigms in Evolutionary Reinforcement Learning: Evolution-guided Reinforcement Learning (ERL): ERL, CEM-RL; and Population-based AutoRL: PBT.
Unified Environment API: Support multiple GPU-accelerated RL environment packages (eg: Brax, gymnax, ...). Multiple Env Wrappers are also provided.
Object-oriented functional programming model: Classes define the static execution logic and their running states are stored externally.

Update

2025-07-14: Our paper "EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning" is accepted by ACM TELO.
2025-04-01: Add support for Mujoco Playground Environments.

Documentation

For comprehensive guidance, please visit our Documentation, where you'll find detailed installation steps, tutorials, practical examples, and complete API references.
EvoRL is also indexed by DeepWiki, providing an AI assistant for beginners. Feel free to ask any question about this repo at https://deepwiki.com/EMI-Group/evorl.

Overview of Key Concepts in EvoRL

Workflow defines the training logic of algorithms.
Agent defines the behavior of a learning agent, and its optional loss functions.
Env provides a unified interface for different environments.
SampleBatch is a data structure for continuous trajectories or shuffled transition batch.
EC module provide EC components like Evolutionary Algorithms (EAs) and related operators.

Installation

EvoRL is developed on the top of jax. So jax should be installed first, please follow JAX official installation guide. Since EvoRL is currently under development, we recommend installing the package from source.

# Install the evorl package from source
git clone https://github.com/EMI-Group/evorl.git
cd evorl
pip install -e .

For developers, see Contributing to EvoRL

Quickstart

Training

EvoRL uses hydra to manage configs and run algorithms. Users can use scripts/train.py or script/train_dist.py to run algorithms from CLI.

# hierarchy of folder `configs/`
configs
├── agent
│   ├── ppo.yaml
│   ├── ...
...
├── config.yaml
├── env
│   ├── brax
│   │   ├── ant.yaml
│   │   ├── ...
│   ├── envpool
│   └── gymnax
└── logging.yaml

Specify the agent and env field based on the related config file path (*.yaml) in configs folder. For example: To train the PPO agent with config file in configs/agent/ppo.yaml on the Brax environment Ant with config file in configs/env/brax/ant.yaml, use:

python scripts/train.py agent=ppo env=brax/ant

# Parallel training two seeds on each GPU.
CUDA_VISIBLE_DEVICES=0,5 python scripts/train_dist.py -m hydra/launcher=joblib \
    agent=exp/ppo/brax/ant env=brax/ant seed=114,514

If multiple GPUs are detected, most algorithms will be automatically trained in distributed mode.

For more advanced usage, see our documentation: Training.

Logging

When not using multi-run mode (without -m), the outputs will be stored in ./outputs. When using multi-run mode (-m), the outputs will be stored in ./multirun. Specifically, when launching algorithms from the training scripts, the log file and checkpoint files will be stored in ./outputs|multirun/train|train_dist/<timestamp>/<exp-name>/.

By default, the training script will enable two recorders for logging: LogRecorder and WandbRecorder. LogRecorder will save logs (*.log) in the above path, and WandbRecorder will upload the data to WandB, which provides beautiful visualizations.

Screenshot in WandB dashboard:

Env Rendering

We provide some example visualization scripts for brax and playground environments: visualize_mjx.ipynb.

Algorithms

Currently, EvoRL supports 4 types of algorithms

| Type | Algorithms | | ----------------------- | ------------------------------------------------------------------------------------------------------------- | | RL | A2C, PPO, IMPALA, DQN, DDPG, TD3, SAC, TD7 | | EA | OpenES, VanillaES, ARS, CMA-ES, algorithms from EvoX (PSO, NSGA-II, ...) | | Evolution-guided RL | ERL-GA, ERL-ES, ERL-EDA, CEMRL, CEMRL-OpenES | | Population-based AutoRL | PBT family (e.g: PBT-PPO, PBT-SAC, PBT-CSO-PPO) |

RL Environments

By default, pip install evorl will automatically install environments on brax. If you want to use other supported environments, please install the additional environment packages. We provide useful extras for different environments.

For example:

# ===== GPU-accelerated Environments =====
# Mujoco playground Envs:
pip install -e ".[mujoco-playground]"
# gymnax Envs:
pip install -e ".[gymnax]"
# Jumanji Envs:
pip install -e ".[jumanji]"
# JaxMARL Envs:
pip install -e ".[jaxmarl]"

# ===== CPU-based Environments =====
# EnvPool Envs: (also require py<3.12)
pip install -e ".[envpool]"
# Gymnasium Envs:
pip install -e ".[gymnasium]"

[!WARNING] These additional environments have limited supports and some algorithms are incompatible with them.

Current Supported Environments

| Environment Library | Descriptions | | -------------------------------------------------------------------------- | --------------------------------------- | | Brax | Robotic control | | MuJoCo Playground | Robotic control | | gymnax (experimental) | classic control, bsuite, MinAtar | | JaxMARL (experimental) | Multi-agent Envs | | Jumanji (experimental) | Game, Combinatorial optimization | | EnvPool (experimental) | High-performance CPU-based environments | | Gymnasium (experimental) | Standard CPU-based environments |

PRs for other environment libraries are welcomed.

Performance

Test

Evorl

Install / Use

README

Table of Contents