PDPP

[CVPR 2023 Hightlight] PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

Generate Convert Improve

Install / Use

/learn @MCG-NJU/PDPP

About this skill

Quality Score

0/100

README

PDPP

[CVPR 2023 Hightlight] PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

This repository gives the official PyTorch implementation of PDPP:Projected Diffusion for Procedure Planning in Instructional Videos (CVPR 2023)

News

We have updated our paper with the following changes:
- We correct a mistake in the classification result of MLP for $CrossTask_{Base}$ when T=3, which should be around 83. The evaluation metrics results for $CrossTask_{Base}$ when T=3 thus dropped slightly, but still outperform all previous methods.
- We notice the initial random noise for sampling can influence the result, especially for NIV. Thus we update our results to the mean values of multiple sampling results with different initial random noises. We use the DDIM sampling process to get all results.
- The batch size value in our "Impact of batch size on mIoU" section of supplement is the sum of 8 GPUs in our old version paper. We rewrite it as the batch size value for a single GPU to avoid misunderstanding.

Setup

In a conda env with cuda available, run:

pip install -r requirements.txt

Data Preparation

CrossTask

Download datasets&features

cd {root}/dataset/crosstask
bash download.sh

move your datasplit files and action one-hot coding file to {root}/dataset/crosstask/crosstask_release/

mv *.json crosstask_release
mv actions_one_hot.npy crosstask_release

COIN

Download datasets&features

cd {root}/dataset/coin
bash download.sh

NIV

Download datasets&features

cd {root}/dataset/NIV
bash download.sh

Train

Train MLPs for task category prediction(By default,8 GPUs are used for training), you can modify the dataset, train steps, horizon(prediction length), json files savepath etc. in args.py.

python train_mlp.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate

Dimensions for different datasets are listed below:

| Dataset | observation_dim | action_dim | class_dim | | --------- | -------------------- | ---------- | --------- | | CrossTask | 1536(how) 9600(base) | 105 | 18 | | COIN | 1536 | 778 | 180 | | NIV | 1536 | 48 | 5 |

The trained MLPs will be saved in {root}/save_max_mlp and json files for training and testing data will be generated. Then run temp.py to generate json files with predicted task class for testing:

Modify the checkpoint path(L86) and json file path(L111) in temp.py and run:

CUDA_VISIBLE_DEVICES=0 python temp.py --multiprocessing-distributed --num_thread_reader=1 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=32 --batch_size_val=32 --evaluate

Train PDPP: Modify the 'json_path_val' in args.py as the output file of temp.py and run:

python main_distributed.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate

Training settings for different datasets are listed below:

| Dataset | n_diffusion_steps | n_train_steps | epochs | learning-rate | | ------------------ | ----------------- | ------------- | ------ | ------------- | | CrossTask${Base}$ | 200 | 200 | 60 | 8e-4 | | CrossTask${How}$ | 200 | 200 | 120 | 5e-4 | | COIN | 200 | 200 | 800 | 1e-5 | | NIV | 50 | 50 | 130 | 3e-4 |

Learning-rate schedule can be adjusted in helpers.py. Schedule details can be found in the supplement. The trained models will be saved in {root}/save_max.

To train the $Deterministic$ and $Noise$ baselines, you need to modify temporal.py to remove 'time_mlp' modules and modify diffusion.py to change the initial noise, 'training' functions and p_sample_loop process.

Inference

Checkpoints

Note: Numbers may vary from runs to runs for PDPP and $Noise$ baseline, due to probalistic sampling.

For Metrics

Modify the checkpoint path(L244) as the evaluated model in inference.py and run:

python inference.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate > output.txt

Results of given checkpoints:

| | SR | mAcc | MIoU | | ----------------------- | ----- | ----- | ----- | | Crosstask_T=3_diffusion | 37.20 | 64.67 | 66.57 | | COIN_T=3_diffusion | 21.33 | 45.62 | 51.82 | | NIV_T=3_diffusion | 30.20 | 48.45 | 57.28 |

For probabilistic modeling

To evaluate the $Deterministic$ and $Noise$ baselines, you need to modify temporal.py to remove 'time_mlp' modules and modify diffusion.py to change the initial noise and p_sample_loop process. For $Deterministic$ baseline, num_sampling(L26) in uncertain.py should be 1.

Modify the checkpoint path(L309) as the evaluated model in uncertain.py and run:

CUDA_VISIBLE_DEVICES=0 python uncertain.py --multiprocessing-distributed --num_thread_reader=1 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=32 --batch_size_val=32 --evaluate > output.txt

Results of given checkpoints:

| | NLL | KL-Div | ModePrec | ModeRec | | ----------------------- | ---- | ------ | -------- | ------- | | Crosstask_T=6_diffusion | 4.06 | 2.76 | 25.61 | 22.68 | | Crosstask_T=6_noise | 4.79 | 3.49 | 24.51 | 11.04 | | Crosstask_T=6_zero | 5.12 | 3.82 | 25.24 | 6.75 |

Citation

If this project helps you in your research or project, please cite our paper:

@inproceedings{wang2023pdppprojected,
      title={PDPP:Projected Diffusion for Procedure Planning in Instructional Videos}, 
      author={Hanlin Wang and Yilu Wu and Sheng Guo and Limin Wang},
      booktitle={{CVPR}},
      year={2023}
}

Acknowledgements

We would like to thank He Zhao for his help in extracting the s3d features and providing the evaluation code of probabilistic modeling in P3IV. The diffusion model implementation is based on diffuser and improved-diffusion. We also reference and use some code from PlaTe. Very sincere thanks to the contributors to these excellent codebases.

Related Skills

qqbot-channel

347.0k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.1k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

347.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

MCG-NJU

View profile

View on GitHub

GitHub Stars33

CategoryContent

Updated3mo ago

Forks1

MCG-NJU/PDPP

Languages

Python

Security Score

72/100

Audited on Dec 27, 2025

No findings