Dcd
Implementations of robust Dual Curriculum Design (DCD) algorithms for unsupervised environment design.
Install / Use
/learn @facebookresearch/DcdREADME
Dual Curriculum Design

This codebase contains an extensible framework for implementing various Unsupervised Environment Design (UED) algorithms, including state-of-the-art Dual Curriculum Design (DCD) algorithms with minimax-regret robustness properties like ACCEL and Robust PLR:
- ACCEL
- Robust PLR
- PLR
- REPAIRED
- PAIRED
- ALP-GMM
- Minimax adversarial training
- Domain randomization (DR)
- PAIRED+HiEnt+BC+Evo
We also include experiment configurations for the main experiments in the following papers on DCD methods:
- Replay-Guided Adversarial Design. Jiang et al, 2021 (NeurIPS 2021)
- Evolving Curricula with Regret-Based Environment Design. Parker-Holder et al, 2022 (ICML 2022)
- Stabilizing Unsupervised Environment Design with a Learned Adversary. Mediratta et al, 2023 (CoLLAs 2023)
Citation
The core components of this codebase, as well as the CarRacingBezier and CarRacingF1 environments, were introduced in Jiang et al, 2021. If you use this code to develop your own UED algorithms in academic contexts, please cite
Jiang et al, "Replay-Guided Adversarial Environment Design", 2021.
Additionally, if you use ACCEL or the adversarial BipedalWalker environments in academic contexts, please cite
Parker-Holder et al, "Evolving Curricula with Regret-Based Environment Design", 2022.
Finally, if you use PAIRED with High Entropy (HiEnt) and/or Behavioral Cloning (BC) and/or Evo method, please cite
Mediratta et al, "Stabilizing Unsupervised Environment Design with a Learned Adversary", 2023.
Setup
To install the necessary dependencies, run the following commands:
conda create --name dcd python=3.8
conda activate dcd
pip install -r requirements.txt
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
cd ..
pip install pyglet==1.5.11
A quick overview of train.py
Choosing a UED algorithm
The exact UED algorithm is specified by a combination of values for --ued_algo, --use_plr, no_exploratory_grad_updates, and --ued_editor:
| Method | ued_algo | use_plr| no_exploratory_grad_updates | ued_editor|
| ------------- |:-------------|:-------------|:-------------|:-------------|
| DR | domain_randomization | false | false | false |
| PLR | domain_randomization | true | false | false |
| PLR<sup>⊥</sup> | domain_randomization | true | true | false |
| ACCEL | domain_randomization | true | true | true |
| PAIRED | paired | false | false | false |
| REPAIRED | paired | true | true | false |
| Minimax | minimax | false | false | false |
Full details for the command-line arguments related to PLR and ACCEL can be found in arguments.py. We provide simple configuration JSON files for generating the train.py commands for the best hyperparameters found in experimental settings from prior works.
Logging
By default, train.py generates a folder in the directory specified by the --log_dir argument, named according to --xpid. This folder contains the main training logs, logs.csv, and periodic screenshots of generated levels in the directory screenshots. Each screenshot uses the naming convention update_<number of PPO updates>.png. When ACCEL is turned on, the screenshot naming convention also includes information about whether the level was replayed via PLR and the mutation generation number for the level, i.e. how many mutation cycles led to this level.
Checkpointing
Latest checkpoint
The latest model checkpoint is saved as model.tar. The model is checkpointed every --checkpoint_interval number of updates. When setting --checkpoint_basis=num_updates (default), the checkpoint interval corresponds to number of rollout cycles (which includes one rollout for each student and teacher). Otherwise, when --checkpoint_basis=student_grad_updates, the checkpoint interval corresponds to the number of PPO updates performed by the student agent only. This latter checkpoint basis allows comparing methods based on number of gradient updates actually performed by the student agent, which can differ from number of rollout cycles, as methods based on Robust PLR, like ACCEL, do not perform student gradient updates every rollout cycle.
Archived checkpoints
Separate archived model checkpoints can be saved at specific intervals by specifying a positive value for the argument --archive_interval. For example, setting --archive_interval=1250 and --checkpoint_basis=student_grad_updates will result in saving model checkpoints named model_1250.tar, model_2500.tar, and so on. These archived models are saved in addition to model.tar, which always stores the latest checkpoint, based on --checkpoint_interval.
Evaluating agents with eval.py
Evaluating a single model
The following command evaluates a <model>.tar in an experiment results directory, <xpid>, in a base log output directory <log_dir> for <num_episodes> episodes in each of the environments named <env_name1>, <env_name1>, and <env_name1>, and outputs the results as a .csv in <result_dir>.
python -m eval \
--base_path <log_dir> \
--xpid <xpid> \
--model_tar <model>
--env_names <env_name1>,<env_name2>,<env_name3> \
--num_episodes <num_episodes> \
--result_path <result_dir>
Evaluating multiple models
Similarly, the following command evaluates all models named <model>.tar in experiment results directories matching the prefix <xpid_prefix>. This prefix argument is useful for evaluating models from a set of training runs with the same hyperparameter settings. The resulting .csv will contain a column for each model matched and evaluated this way.
python -m eval \
--base_path <log_dir> \
--prefix <xpid_prefix> \
--model_tar <model> \
--env_names <env_name1>,<env_name2>,<env_name3> \
--num_episodes <num_episodes> \
--accumulator mean \
--result_path <result_dir>
Evaluating on zero-shot benchmarks
Replacing the --env_names=... argument with the --benchmark=<benchmark> argument will perform evaluation over a set of benchmark test environments for the domain specified by <benchmark>. The various zero-shot benchmarks are described below:
| benchmark | Description |
| ------------- |:-------------|
| maze | Human-designed mazes, including singleton and procedurally-generated designs. |
| f1 | The full CarRacing-F1 benchmark: 20 challenging tracks from the Formula-1. |
| bipedal | BipedalWalker-v3, BipedalWalkerHardcore-v3, and isolated challenges for stairs, stumps, pit gaps, and ground roughness. |
| poetrose | Environments based on the most extremely challenging level settings discovered by POET, as reported in the red polygons in the top two rows of Figure 5 in Wang et al, 2019. |
Running experiments
We provide configuration json files to generate the train.py commands for the specific experiment settings featured in the main results of previous works. To generate the command to launch 1 run of the experiment described by the configuration file config.json in the folder train_scripts/grid_configs, simply run the following, and copy and paste the output into your command line.
python train_scripts/make_cmd.py --json config --num_trials 1
Alternatively, you can run the following to copy the command directly to your clipboard:
python train_scripts/make_cmd.py --json config --num_trials 1 | pbcopy
The JSON files for training methods using the best hyperparameters settings in each environment are detailed below.
Environments
🧭 MiniGrid Mazes

The MiniGrid-based mazes from Dennis et al, 2020 and Jiang et al, 2021 require agents to perform partially-observable navigation. Various human-designed singleton and procedurally-generated mazes allow testing of zero-shot transfer performance to out-of-distribution configurations.
Experiments from Jiang et al, 2021
| Method | json config |
| ------------- |:-------------|
| PLR<sup>⊥</sup> | minigrid/25_blocks/mg_25b_robust_plr.json|
| PLR| minigrid/25_blocks/mg_25b_plr.json |
| REPAIRED| minigrid/25_blocks/mg_25b_repaired.json|
| Minimax | minigrid/25_blocks/mg_25b_minimax.json|
| DR | minigrid/25_blocks/mg_25b_dr.json|
Experiments from Parker-Holder et al, 2022
| Method | json config | | ------------- |:-------------| | ACCEL (from empty) | `minigrid/60_block
Related Skills
diffs
342.5kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.9kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
ui-ux-pro-max-skill
55.6kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
