REDQ source code

Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05982

Table of contents
Code structure explained
Implementation video tutorial
Data and reproducing figures in REDQ
Train an REDQ agent
Implement REDQ
Reproduce the results
Newest Docker + Singularity setup for Gym + MuJoCo v2 and v4
(outdated) Environment setup MuJoCo 2.1, v4 tasks, NYU HPC 18.04
(outdated) Environment setup MuJoCo 2.1, Ubuntu 18.04
(outdated) Environment setup MuJoCo 2.1, NYU Shanghai HPC
(outdated) Environment setup
Acknowledgement
Previous updates

Feb 20, 2023: An updated docker + singularity setup is now available. This is probably the easiest set up ever and allows you to use docker to start running your DRL experiments with just 3 commands. We have also released new dockerfiles for gym + mujoco v2 and v4 environments (in the newest version of the repo, you will see 2 folders (docker-gym-mujocov2, docker-gym-mujocov4) each containing a dockerfile).

Code structure explained

The code structure is pretty simple and should be easy to follow.

In experiments/train_redq_sac.py you will find the main training loop. Here we set up the environment, initialize an instance of the REDQSACAgent class, specifying all the hyperparameters and train the agent. You can run this file to train a REDQ agent.

In redq/algos/redq_sac.py we provide code for the REDQSACAgent class. If you are trying to take a look at how the core components of REDQ are implemented, the most important function is the train() function.

In redq/algos/core.py we provide code for some basic classes (Q network, policy network, replay buffer) and some helper functions. These classes and functions are used by the REDQ agent class.

In redq/utils there are some utility classes (such as a logger) and helper functions that largely have nothing to do with REDQ's core components. In redq/utils/bias_utils.py you can find utility functions to get bias estimation (bias estimate is computed roughly as: Monte Carlo return - current Q estimate). In experiments/train_redq_sac.py you can decide whether you want bias evaluation when running the experiment by setting the evaluate_bias flag (this will lead to some minor computation overhead).

In plot_utils there are some utility functions to reproduce the figures we presented in the paper. (See the section on "Data and reproducing figures in REDQ")

Implementation video tutorial

Here is the link to a video tutorial we created that explains the REDQ implementation in detail:

REDQ code explained video tutorial (Google Drive Link)

Data and reproducing figures in REDQ

The data used to produce the figures in the REDQ paper can be downloaded here: REDQ DATA download link (Google Drive Link, ~80 MB)

To reproduce the figures, first download the data, and then extract the zip file to REDQ/data. So now a folder called REDQ_ICLR21 should be at this path: REDQ/data/REDQ_ICLR21.

Then you can go into the plot_utils folder, and run the plot_REDQ.py program there. You will need seaborn==0.8.1 to run it correctly. We might update the code later so that it works for newer versions but currently seaborn newer than 0.8.1 is not supported. If you don't want to mess up existing conda or python virtual environments, you can create a new environment and simply install seaborn 0.8.1 there and use it to run the program.

If you encounter any problem or cannot access the data (can't use google or can't download), please open an issue to let us know! Thanks!

Environment setup (old guide, for the newest guide, see end of this page)

VERY IMPORTANT: because MuJoCo is now free, the setup guide here is slightly outdated (this is the setup we used when we run our experiments for the REDQ paper), we now provide a newer updated setup guide that uses the newest MuJoCo, please see the end of the this page.

Note: you don't need to exactly follow the tutorial here if you know well about how to install python packages.

First create a conda environment and activate it:

conda create -n redq python=3.6
conda activate redq

Install PyTorch (or you can follow the tutorial on PyTorch official website). On Ubuntu (might also work on Windows but is not fully tested):

conda install pytorch==1.3.1 torchvision==0.4.2 cudatoolkit=10.1 -c pytorch

On OSX:

conda install pytorch==1.3.1 torchvision==0.4.2 -c pytorch

Install gym (0.17.2):

git clone https://github.com/openai/gym.git
cd gym
git checkout b2727d6
pip install -e .
cd ..

Install mujoco_py (2.0.2.1):

git clone https://github.com/openai/mujoco-py
cd mujoco-py
git checkout 379bb19
pip install -e . --no-cache
cd ..

For gym and mujoco_py, depending on your system, you might need to install some other packages, if you run into such problems, please refer to their official sites for guidance. If you want to test on Mujoco environments, you will also need to get Mujoco files and license from Mujoco website. Please refer to the Mujoco website for how to do this correctly.

Clone and install this repository (Although even if you don't install it you might still be able to use the code):

git clone https://github.com/watchernyu/REDQ.git
cd REDQ
pip install -e .

Train an REDQ agent

To train an REDQ agent, run:

python experiments/train_redq_sac.py

On a 2080Ti GPU, running Hopper to 125K will approximately take 10-12 hours. Running Humanoid to 300K will approximately take 26 hours.

Implement REDQ

As discussed in the paper, we obtain REDQ by making minimal changes to a Soft Actor-Critic (SAC) baseline. You can easily modify your SAC code to get REDQ: (a) use an update-to-data (UTD) ratio > 1, (b) have > 2 Q networks, (c) when computing the Q target, randomly select a subset of Q target networks, take their min.

If you intend to implement REDQ on your codebase, please refer to the paper and the video tutorial for guidance. In particular, in Appendix B of the paper, we discussed hyperparameters and some additional implementation details. One important detail is in the beginning of the training, for the first 5000 data points, we sample random action from the action space and do not perform any updates. If you perform a large number of updates with a very small amount of data, it can lead to severe bias accumulation and can negatively affect the performance.

For REDQ-OFE, as mentioned in the paper, for some reason adding PyTorch batch norm to OFENet will lead to divergence. So in the end we did not use batch norm in our code.

Reproduce the results

If you use a different PyTorch version, it might still work, however, it might be better if your version is close to the ones we used. We have found that for example, on Ant environment, PyTorch 1.3 and 1.2 give quite different results. The reason is not entirely clear.

Other factors such as versions of other packages (for example numpy) or environment (mujoco/gym) or even types of hardware (cpu/gpu) can also affect the final results. Thus reproducing exactly the same results can be difficult. However, if the package versions are the same, when averaged over a large number of random seeds, the overall performance should be similar to those reported in the paper.

As of Mar. 29, 2021, we have used the installation guide on this page to re-setup a conda environment and run the code hosted on this repo and the reproduced results are similar to what we have in the paper (though not exactly the same, in some environments, performance are a bit stronger and others a bit weaker).

Please open an issue if you find any problems in the code, thanks!

Environment setup with MuJoCo and OpenAI Gym v2/V4 tasks, with Docker or Singularity

This is a new 2023 Guide that is based on Docker and Singularity. (currently under more testing)

Local setup: simply build a docker container with the dockerfile (either the v2 or the v4 version, depending on your need) provided in this repo (it basically specifies what you need to do to install all dependencies starting with a ubuntu18 system. You can also easily modify it to your needs).

To get things to run very quick (with v2 gym-mujoco environments), simply pull this from my dockerhub: docker pull cwatcherw/gym-mujocov2:1.0

After you pull the docker container, you can quickly test it:

docker run -it --rm cwatcherw/gym-mujocov2:1.0

Once you are inside the container, run:

cd /workspace/REDQ/experiments/
python train_redq_sac.py

(Alternatively, remove --rm flag so the container is kept after shutting down, or add --gpus all to use GPU. )

If you want to modify the REDQ codebase to test new ideas, you can clone (a fork of) REDQ repo to a local directory, and then mount it to /workspace/REDQ. For example:

docker run  -it --rm  --mount type=bind,source=$(pwd)/REDQ,target=/workspace/REDQ  cwatcherw/gym-mujocov2:1.0

Example setup if you want to run on a Slurm HPC with singularity (you might need to make changes, depending on your HPC settings):

First time setup:

mkdir /scratch/$USER/.sing_cache
expor

REDQ

Install / Use

README