Reaver: Modular Deep Reinforcement Learning Framework

Project status: No longer maintained!
Unfortunately, I am no longer able to further develop or provide support to the project.

Introduction

Reaver is a modular deep reinforcement learning framework with a focus on various StarCraft II based tasks, following in DeepMind's footsteps who are pushing state-of-the-art of the field through the lens of playing a modern video game with human-like interface and limitations. This includes observing visual features similar (though not identical) to what a human player would perceive and choosing actions from similar pool of options a human player would have. See StarCraft II: A New Challenge for Reinforcement Learning article for more details.

Though development is research-driven, the philosophy behind Reaver API is akin to StarCraft II game itself - it has something to offer both for novices and experts in the field. For hobbyist programmers Reaver offers all the tools necessary to train DRL agents by modifying only a small and isolated part of the agent (e.g. hyperparameters). For veteran researchers Reaver offers simple, but performance-optimized codebase with modular architecture: agent, model, and environment are decoupled and can be swapped at will.

While the focus of Reaver is on StarCraft II, it also has full support for other popular environments, notably Atari and MuJoCo. Reaver agent algorithms are validated against reference results, e.g. PPO agent is able to match Proximal Policy Optimization Algorithms. Please see below for more details.

Installation

PIP Package

Easiest way to install Reaver is through the PIP package manager:

pip install reaver

You can also install additional extras (e.g. gym support) through the helper flags:

pip install reaver[gym,atari,mujoco]

Manual Installation

If you plan to modify Reaver codebase you can retain its module functionality by installing from source:

$ git clone https://github.com/inoryy/reaver-pysc2
$ pip install -e reaver-pysc2/

By installing with -e flag Python will now look for reaver in the specified folder, rather than site-packages storage.

Windows

Please see the wiki page for detailed instructions on setting up Reaver on Windows.

However, if possible please consider using Linux OS instead - due to performance and stability considerations. If you would like to see your agent perform with full graphics enabled you can save a replay of the agent on Linux and open it on Windows. This is how the video recording listed below was made.

Requirements

PySC2 >= 3.0.0
StarCraft II >= 4.1.2 (instructions)
gin-config >= 0.3.0
TensorFlow >= 2.0.0
TensorFlow Probability >= 0.9

Optional Extras

If you would like to use Reaver with other supported environments, you must install relevant packages as well:

gym >= 0.10.0
atari-py >= 0.1.5
mujoco-py >= 1.50.0
- roboschool >= 1.0 (alternative)

Quick Start

You can train a DRL agent with multiple StarCraft II environments running in parallel with just four lines of code!

import reaver as rvr

env = rvr.envs.SC2Env(map_name='MoveToBeacon')
agent = rvr.agents.A2C(env.obs_spec(), env.act_spec(), rvr.models.build_fully_conv, rvr.models.SC2MultiPolicy, n_envs=4)
agent.run(env)

Moreover, Reaver comes with highly configurable commandline tools, so this task can be reduced to a short one-liner!

python -m reaver.run --env MoveToBeacon --agent a2c --n_envs 4 2> stderr.log

With the line above Reaver will initialize the training procedure with a set of pre-defined hyperparameters, optimized specifically for the given environment and agent. After awhile you will start seeing logs with various useful statistics in your terminal screen.

| T    118 | Fr     51200 | Ep    212 | Up    100 | RMe    0.14 | RSd    0.49 | RMa    3.00 | RMi    0.00 | Pl    0.017 | Vl    0.008 | El 0.0225 | Gr    3.493 | Fps   433 |
| T    238 | Fr    102400 | Ep    424 | Up    200 | RMe    0.92 | RSd    0.97 | RMa    4.00 | RMi    0.00 | Pl   -0.196 | Vl    0.012 | El 0.0249 | Gr    1.791 | Fps   430 |
| T    359 | Fr    153600 | Ep    640 | Up    300 | RMe    1.80 | RSd    1.30 | RMa    6.00 | RMi    0.00 | Pl   -0.035 | Vl    0.041 | El 0.0253 | Gr    1.832 | Fps   427 |
...
| T   1578 | Fr    665600 | Ep   2772 | Up   1300 | RMe   24.26 | RSd    3.19 | RMa   29.00 | RMi    0.00 | Pl    0.050 | Vl    1.242 | El 0.0174 | Gr    4.814 | Fps   421 |
| T   1695 | Fr    716800 | Ep   2984 | Up   1400 | RMe   24.31 | RSd    2.55 | RMa   30.00 | RMi   16.00 | Pl    0.005 | Vl    0.202 | El 0.0178 | Gr   56.385 | Fps   422 |
| T   1812 | Fr    768000 | Ep   3200 | Up   1500 | RMe   24.97 | RSd    1.89 | RMa   31.00 | RMi   21.00 | Pl   -0.075 | Vl    1.385 | El 0.0176 | Gr   17.619 | Fps   423 |

Reaver should quickly converge to about 25-26 RMe (mean episode rewards), which matches DeepMind results for this environment. Specific training time depends on your hardware. Logs above are produced on a laptop with Intel i5-7300HQ CPU (4 cores) and GTX 1050 GPU, the training took around 30 minutes.

After Reaver has finished training, you can look at how it performs by appending --test and --render flags to the one-liner.

python -m reaver.run --env MoveToBeacon --agent a2c --test --render 2> stderr.log

Google Colab

A companion Google Colab notebook notebook is available to try out Reaver online.

Key Features

Performance

Many modern DRL algorithms rely on being executed in multiple environments at the same time in parallel. As Python has GIL, this feature must be implemented through multiprocessing. Majority of open source implementations solve this task with message-based approach (e.g. Python multiprocessing.Pipe or MPI), where individual processes communicate by sending data through IPC. This is a valid and most likely only reasonable approach for large-scale distributed approaches that companies like DeepMind and openAI operate on.

However, for a typical researcher or hobbyist a much more common scenario is having access only to a single machine environment, whether it is a laptop or a node on a HPC cluster. Reaver is optimized specifically for this case by making use of shared memory in a lock-free manner. This approach nets significant performance boost of up to 1.5x speed-up in StarCraft II sampling rate (and up to 100x speedup in general case), being bottle-necked almost exclusively by GPU input/output pipeline.

Extensibility

The three core Reaver modules - envs, models, and agents are almost completely detached from each other. This ensures that extending functionality in one module is seamlessly integrated into the others.

Configurability

All configuration is handled through gin-config and can be easily shared as .gin files. This includes all hyperparameters, environment arguments, and model definitions.

Implemented Agents

Advantage Actor-Critic (A2C)
Proximal Policy Optimization (PPO)

Additional RL Features

Generalized Advantage Estimation (GAE)
Rewards clipping
Gradient norm clipping
Advantage normalization
Baseline (critic) bootstrapping
Separate baseline network

But Wait! There's more!

When experimenting with novel ideas it is important to get feedback quickly, which is often not realistic with complex environments like StarCraft II. As Reaver was built with modular architecture, its agent implementations are not actually tied to StarCraft II at all. You can make drop-in replacements for many popular game environments (e.g. openAI gym) and verify im

Reaver

Install / Use

README