SkillAgentSearch skills...

Dcd

Implementations of robust Dual Curriculum Design (DCD) algorithms for unsupervised environment design.

Install / Use

/learn @facebookresearch/Dcd
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

Dual Curriculum Design

License: CC BY-SA 4.0

DCD overview diagram DCD overview diagram

This codebase contains an extensible framework for implementing various Unsupervised Environment Design (UED) algorithms, including state-of-the-art Dual Curriculum Design (DCD) algorithms with minimax-regret robustness properties like ACCEL and Robust PLR:

We also include experiment configurations for the main experiments in the following papers on DCD methods:

Citation

The core components of this codebase, as well as the CarRacingBezier and CarRacingF1 environments, were introduced in Jiang et al, 2021. If you use this code to develop your own UED algorithms in academic contexts, please cite

Jiang et al, "Replay-Guided Adversarial Environment Design", 2021.

(Bibtex here)

Additionally, if you use ACCEL or the adversarial BipedalWalker environments in academic contexts, please cite

Parker-Holder et al, "Evolving Curricula with Regret-Based Environment Design", 2022.

(Bibtex here)

Finally, if you use PAIRED with High Entropy (HiEnt) and/or Behavioral Cloning (BC) and/or Evo method, please cite

Mediratta et al, "Stabilizing Unsupervised Environment Design with a Learned Adversary", 2023.

(Bibtex here)

Setup

To install the necessary dependencies, run the following commands:

conda create --name dcd python=3.8
conda activate dcd
pip install -r requirements.txt
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
cd ..
pip install pyglet==1.5.11

A quick overview of train.py

Choosing a UED algorithm

The exact UED algorithm is specified by a combination of values for --ued_algo, --use_plr, no_exploratory_grad_updates, and --ued_editor: | Method | ued_algo | use_plr| no_exploratory_grad_updates | ued_editor| | ------------- |:-------------|:-------------|:-------------|:-------------| | DR | domain_randomization | false | false | false | | PLR | domain_randomization | true | false | false | | PLR<sup></sup> | domain_randomization | true | true | false | | ACCEL | domain_randomization | true | true | true | | PAIRED | paired | false | false | false | | REPAIRED | paired | true | true | false | | Minimax | minimax | false | false | false |

Full details for the command-line arguments related to PLR and ACCEL can be found in arguments.py. We provide simple configuration JSON files for generating the train.py commands for the best hyperparameters found in experimental settings from prior works.

Logging

By default, train.py generates a folder in the directory specified by the --log_dir argument, named according to --xpid. This folder contains the main training logs, logs.csv, and periodic screenshots of generated levels in the directory screenshots. Each screenshot uses the naming convention update_<number of PPO updates>.png. When ACCEL is turned on, the screenshot naming convention also includes information about whether the level was replayed via PLR and the mutation generation number for the level, i.e. how many mutation cycles led to this level.

Checkpointing

Latest checkpoint The latest model checkpoint is saved as model.tar. The model is checkpointed every --checkpoint_interval number of updates. When setting --checkpoint_basis=num_updates (default), the checkpoint interval corresponds to number of rollout cycles (which includes one rollout for each student and teacher). Otherwise, when --checkpoint_basis=student_grad_updates, the checkpoint interval corresponds to the number of PPO updates performed by the student agent only. This latter checkpoint basis allows comparing methods based on number of gradient updates actually performed by the student agent, which can differ from number of rollout cycles, as methods based on Robust PLR, like ACCEL, do not perform student gradient updates every rollout cycle.

Archived checkpoints Separate archived model checkpoints can be saved at specific intervals by specifying a positive value for the argument --archive_interval. For example, setting --archive_interval=1250 and --checkpoint_basis=student_grad_updates will result in saving model checkpoints named model_1250.tar, model_2500.tar, and so on. These archived models are saved in addition to model.tar, which always stores the latest checkpoint, based on --checkpoint_interval.

Evaluating agents with eval.py

Evaluating a single model

The following command evaluates a <model>.tar in an experiment results directory, <xpid>, in a base log output directory <log_dir> for <num_episodes> episodes in each of the environments named <env_name1>, <env_name1>, and <env_name1>, and outputs the results as a .csv in <result_dir>.

python -m eval \
--base_path <log_dir> \
--xpid <xpid> \
--model_tar <model>
--env_names <env_name1>,<env_name2>,<env_name3> \
--num_episodes <num_episodes> \
--result_path <result_dir>

Evaluating multiple models

Similarly, the following command evaluates all models named <model>.tar in experiment results directories matching the prefix <xpid_prefix>. This prefix argument is useful for evaluating models from a set of training runs with the same hyperparameter settings. The resulting .csv will contain a column for each model matched and evaluated this way.

python -m eval \
--base_path <log_dir> \
--prefix <xpid_prefix> \
--model_tar <model> \
--env_names <env_name1>,<env_name2>,<env_name3> \
--num_episodes <num_episodes> \
--accumulator mean \
--result_path <result_dir>

Evaluating on zero-shot benchmarks

Replacing the --env_names=... argument with the --benchmark=<benchmark> argument will perform evaluation over a set of benchmark test environments for the domain specified by <benchmark>. The various zero-shot benchmarks are described below: | benchmark | Description | | ------------- |:-------------| | maze | Human-designed mazes, including singleton and procedurally-generated designs. | | f1 | The full CarRacing-F1 benchmark: 20 challenging tracks from the Formula-1. | | bipedal | BipedalWalker-v3, BipedalWalkerHardcore-v3, and isolated challenges for stairs, stumps, pit gaps, and ground roughness. | | poetrose | Environments based on the most extremely challenging level settings discovered by POET, as reported in the red polygons in the top two rows of Figure 5 in Wang et al, 2019. |

Running experiments

We provide configuration json files to generate the train.py commands for the specific experiment settings featured in the main results of previous works. To generate the command to launch 1 run of the experiment described by the configuration file config.json in the folder train_scripts/grid_configs, simply run the following, and copy and paste the output into your command line.

python train_scripts/make_cmd.py --json config --num_trials 1

Alternatively, you can run the following to copy the command directly to your clipboard:

python train_scripts/make_cmd.py --json config --num_trials 1 | pbcopy

The JSON files for training methods using the best hyperparameters settings in each environment are detailed below.

Environments

🧭 MiniGrid Mazes

Example mazes

The MiniGrid-based mazes from Dennis et al, 2020 and Jiang et al, 2021 require agents to perform partially-observable navigation. Various human-designed singleton and procedurally-generated mazes allow testing of zero-shot transfer performance to out-of-distribution configurations.

Experiments from Jiang et al, 2021

| Method | json config | | ------------- |:-------------| | PLR<sup></sup> | minigrid/25_blocks/mg_25b_robust_plr.json| | PLR| minigrid/25_blocks/mg_25b_plr.json | | REPAIRED| minigrid/25_blocks/mg_25b_repaired.json| | Minimax | minigrid/25_blocks/mg_25b_minimax.json| | DR | minigrid/25_blocks/mg_25b_dr.json|

Experiments from Parker-Holder et al, 2022

| Method | json config | | ------------- |:-------------| | ACCEL (from empty) | `minigrid/60_block

Related Skills

View on GitHub
GitHub Stars138
CategoryDesign
Updated3mo ago
Forks34

Languages

Python

Security Score

77/100

Audited on Dec 17, 2025

No findings