SkillAgentSearch skills...

Dppo

Official implementation of Diffusion Policy Policy Optimization, arxiv 2024

Install / Use

/learn @irom-princeton/Dppo
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Diffusion Policy Policy Optimization (DPPO)

[Paper]  [Website]

Allen Z. Ren<sup>1</sup>, Justin Lidard<sup>1</sup>, Lars L. Ankile<sup>2,3</sup>, Anthony Simeonov<sup>3</sup><br> Pulkit Agrawal<sup>3</sup>, Anirudha Majumdar<sup>1</sup>, Benjamin Burchfiel<sup>4</sup>, Hongkai Dai<sup>4</sup>, Max Simchowitz<sup>3,5</sup>

<sup>1</sup>Princeton University, <sup>2</sup>Harvard University, <sup>3</sup>Masschusetts Institute of Technology<br> <sup>4</sup>Toyota Research Institute, <sup>5</sup>Carnegie Mellon University

<img src="https://github.com/diffusion-ppo/diffusion-ppo.github.io/blob/main/img/overview-full.png" alt="drawing" width="100%"/>

DPPO is an algorithmic framework and set of best practices for fine-tuning diffusion-based policies in continuous control and robot learning tasks.

<!-- ## Release * [08/30/24] [DPPO](https://diffusion-ppo.github.io/) codebase and technical whitepaper are released. -->

Installation

  1. Clone the repository
git clone git@github.com:irom-lab/dppo.git
cd dppo
  1. Install core dependencies with a conda environment (if you do not plan to use Furniture-Bench, a higher Python version such as 3.10 can be installed instead) on a Linux machine with a Nvidia GPU.
conda create -n dppo python=3.8 -y
conda activate dppo
pip install -e .
  1. Install specific environment dependencies (Gym / Kitchen / Robomimic / D3IL / Furniture-Bench) or all dependencies (except for Kitchen, which has dependency conflicts with other tasks).
pip install -e .[gym] # or [kitchen], [robomimic], [d3il], [furniture]
pip install -e .[all] # except for Kitchen
  1. Install MuJoCo for Gym and/or Robomimic. Install D3IL. Install IsaacGym and Furniture-Bench

  2. Set environment variables for data and logging directory (default is data/ and log/), and set WandB entity (username or team name)

source script/set_path.sh

Usage - Pre-training

Note: You may skip pre-training if you would like to use the default checkpoint (available for download) for fine-tuning.

<!-- ### Prepare pre-training data First create a directory as the parent directory of the pre-training data and set the environment variable for it. ```console export DPPO_DATA_DIR=/path/to/data --> <!-- ``` -->

Pre-training data for all tasks are pre-processed and can be found at here. Pre-training script will download the data (including normalization statistics) automatically to the data directory.

<!-- The data path follows `${DPPO_DATA_DIR}/<benchmark>/<task>/train.npz`, e.g., `${DPPO_DATA_DIR}/gym/hopper-medium-v2/train.npz`. -->

Run pre-training with data

All the configs can be found under cfg/<env>/pretrain/. A new WandB project may be created based on wandb.project in the config file; set wandb=null in the command line to test without WandB logging.

<!-- To run pre-training, first set your WandB entity (username or team name) and the parent directory for logging as environment variables. --> <!-- ```console export DPPO_WANDB_ENTITY=<your_wandb_entity> export DPPO_LOG_DIR=<your_prefered_logging_directory> ``` -->
# Gym - hopper/walker2d/halfcheetah
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/gym/pretrain/hopper-medium-v2
# Robomimic - lift/can/square/transport
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/robomimic/pretrain/can
# D3IL - avoid_m1/m2/m3
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/d3il/pretrain/avoid_m1
# Furniture-Bench - one_leg/lamp/round_table_low/med
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/furniture/pretrain/one_leg_low

See here for details of the experiments in the paper.

Usage - Fine-tuning

<!-- ### Set up pre-trained policy --> <!-- If you did not set the environment variables for pre-training, we need to set them here for fine-tuning. ```console export DPPO_WANDB_ENTITY=<your_wandb_entity> export DPPO_LOG_DIR=<your_prefered_logging_directory> ``` --> <!-- First create a directory as the parent directory of the downloaded checkpoints and set the environment variable for it. ```console export DPPO_LOG_DIR=/path/to/checkpoint ``` -->

Pre-trained policies used in the paper can be found here. Fine-tuning script will download the default checkpoint automatically to the logging directory.

<!-- or you may manually download other ones (different epochs) or use your own pre-trained policy if you like. --> <!-- e.g., `${DPPO_LOG_DIR}/gym-pretrain/hopper-medium-v2_pre_diffusion_mlp_ta4_td20/2024-08-26_22-31-03_42/checkpoint/state_0.pt`. --> <!-- The checkpoint path follows `${DPPO_LOG_DIR}/<benchmark>/<task>/.../<run>/checkpoint/state_<epoch>.pt`. -->

Fine-tuning pre-trained policy

All the configs can be found under cfg/<env>/finetune/. A new WandB project may be created based on wandb.project in the config file; set wandb=null in the command line to test without WandB logging.

<!-- Running them will download the default pre-trained policy. --> <!-- Running the script will download the default pre-trained policy checkpoint specified in the config (`base_policy_path`) automatically, as well as the normalization statistics, to `DPPO_LOG_DIR`. -->
# Gym - hopper/walker2d/halfcheetah
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/finetune/hopper-v2
# Robomimic - lift/can/square/transport
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/robomimic/finetune/can
# D3IL - avoid_m1/m2/m3
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/d3il/finetune/avoid_m1
# Furniture-Bench - one_leg/lamp/round_table_low/med
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/furniture/finetune/one_leg_low

Note: In Gym, Robomimic, and D3IL tasks, we run 40, 50, and 50 parallelized MuJoCo environments on CPU, respectively. If you would like to use fewer environments (given limited CPU threads, or GPU memory for rendering), you can reduce env.n_envs and increase train.n_steps, so the total number of environment steps collected in each iteration (n_envs x n_steps x act_steps) remains roughly the same. Try to set train.n_steps a multiple of env.max_episode_steps / act_steps, and be aware that we only count episodes finished within an iteration for eval. Furniture-Bench tasks run IsaacGym on a single GPU.

To fine-tune your own pre-trained policy instead, override base_policy_path to your own checkpoint, which is saved under checkpoint/ of the pre-training directory. You can set base_policy_path=<path> in the command line when launching fine-tuning.

<!-- **Note**: If you did not download the pre-training [data](https://drive.google.com/drive/folders/1AXZvNQEKOrp0_jk1VLepKh_oHCg_9e3r?usp=drive_link), you need to download the normalization statistics from it for fine-tuning, e.g., `${DPPO_DATA_DIR}/furniture/round_table_low/normalization.pkl`. -->

See here for details of the experiments in the paper.

Visualization

  • Furniture-Bench tasks can be visualized in GUI by specifying env.specific.headless=False and env.n_envs=1 in fine-tuning configs.
  • D3IL environment can be visualized in GUI by +env.render=True, env.n_envs=1, and train.render.num=1. There is a basic script at script/test_d3il_render.py.
  • Videos of trials in Robomimic tasks can be recorded by specifying env.save_video=True, train.render.freq=<iterations>, and train.render.num=<num_video> in fine-tuning configs.

Usage - Evaluation

Pre-trained or fine-tuned policies can be evaluated without running the fine-tuning script now. Some example configs are provided under cfg/{gym/robomimic/furniture}/eval} including ones below. Set base_policy_path to override the default checkpoint, and ft_denoising_steps needs to match fine-tuning config (otherwise assumes ft_denoising_steps=0, which means evaluating the pre-trained policy).

python script/run.py --config-name=eval_diffusion_mlp \
    --config-dir=cfg/gym/eval/hopper-v2 ft_denoising_steps=?
python script/run.py --config-name=eval_{diffusion/gaussian}_mlp_{?img} \
    --config-dir=cfg/robomimic/eval/can ft_denoising_steps=?
python script/run.py --config-name=eval_diffusion_mlp \
    --config-dir=cfg/furniture/eval/one_leg_low ft_denoising_steps=?

DPPO implementation

Our diffusion implementation is mostly based on Diffuser and at model/diffusion/diffusion.py and model/diffusion/diffusion_vpg.py. PPO specifics are implemented at model/diffusion/diffusion_ppo.py. The main training script is at agent/finetune/train_ppo_diffusion_agent.py that follows CleanRL.

Key configurations

  • denoising_steps: number of denoising steps (should always be the same for pre-training and fine-tuning regardless the fine-tuning scheme)
  • ft_denoising_steps: number of fine-tuned denoising steps
  • horizon_steps: predicted action chunk size (should be the same as `act_
View on GitHub
GitHub Stars781
CategoryDevelopment
Updated3d ago
Forks99

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings