Diff3DHPE
Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation [R6D 2023] [Official]
Install / Use
/learn @csiro-icvg/Diff3DHPEREADME
Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation [R6D 2023]
<div style="text-align:center"> <img src="assets/Diff3DHPE_MixSTE.png" width="1500" alt="Overall framework of Diff3DHPE during the reverse diffusion process in \textit{seq2seq} style. In the iteration step $t$, a 2D keypoint sequence $\tens{x}$ is concatenated with its corresponding noisy 3D predicted sequence $\hat{\tens{y}}_t$ along the channel dimension as the input $(\tens{x}, \hat{\tens{y}}_t)$. The backbone model takes $(\tens{x}, \hat{\tens{y}}_t)$ and $t$ to predict a final 3D sequence $\hat{\tens{y}}_{0,t}$ at the step $t$. Then, $\hat{\tens{y}}_{t-1}$ is obtained from a predefined reverse diffusion function and sent to the next iteration for refining. To note, the backbone model is MixSTE in this example."/> </div> The Pytroch implementation for <a href="https://openaccess.thecvf.com/content/ICCV2023W/R6D/html/Zhou_Diff3DHPE_A_Diffusion_Model_for_3D_Human_Pose_Estimation_ICCVW_2023_paper.html">"Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation"</a>.Qualitative and quantitative results
<div style="text-align:center"> <img src="assets/viz.gif" width="1200" height="400" /> </div>Human3.6M
CPN, 81 frames
| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | PoseFormer | 44.3 | | MixSTE | 42.4 | | P-STMO-S | 44.1 | | Diff3DHPE-MixSTE | 42.0 |
CPN, 243 frames
| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | MixSTE | 40.9 | | P-STMO-S | 42.8 | | Diff3DHPE-MixSTE | 40.0 |
GT, 81 frames
| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | PoseFormer | 31.3 | | MixSTE | 25.9 | | Diff3DHPE-MixSTE | 24.2 |
GT, 243 frames
| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | MixSTE | 21.6 | | P-STMO-S | 29.3 | | Diff3DHPE-MixSTE | 20.2 |
MPI-INF-3DHP
GT
| Method | Frames | PCK (%) | AUC (%) | MPJPE (mm) | |:--------------------------------------------------:|:------:|:-------:|:-------:|:----------:| | PoseFormer | 9 | 88.6 | 56.4 | 77.1 | | MixSTE | 27 | 94.4 | 66.5 | 54.9 | | P-STMO-S | 81 | 97.9 | 75.8 | 32.2 | | Diff3DHPE-MixSTE | 27 | 99.1 | 84.8 | 19.6 |
Environment
Please create the envirmennt by the following command:
conda env create -f Diff3DHPE.yml
Dataset
Please refer to data/README.MD
Experiments
Please refer to Experiments.sh
Pretrained models OneDirve
Visuzalization
Figure
python visualization_fig.py --gpu_id 0 -sviz S9 -a "Photo 1" -cam 1 -s 81 -f 81 -b 1 --sampling_timesteps 9 -c checkpoint/h36m/ConditionalDiffusionMixSTES2SGRANDLinLift/cpn/f81 --evaluate ConditionalDiffusionMixSTES2SGRANDLinLift_l2_lr4e-4_useTembed_T_h36m_cpn_81f.bin --config configs/h36m_cpn_s2s_ConditionalDiffusionMixSTES2SGRANDLinLift.json --viz-video data/Videos/S9/Videos/Photo\ 1.55011271.mp4
Animation
python visualization_ani.py --gpu_id 0 -sviz S9 -a "Photo 1" -cam 1 -s 81 -f 81 -b 4 --sampling_timesteps 9 -c checkpoint/h36m/ConditionalDiffusionMixSTES2SGRANDLinLift/cpn/f81 --evaluate ConditionalDiffusionMixSTES2SGRANDLinLift_l2_lr4e-4_useTembed_T_h36m_cpn_81f.bin --config configs/h36m_cpn_s2s_ConditionalDiffusionMixSTES2SGRANDLinLift.json --viz-video data/Videos/S9/Videos/Photo\ 1.55011271.mp4 --viz-output viz.mp4
Citation
If you find this repo useful, please consider citing our paper:
@InProceedings{Zhou_2023_ICCV,
author = {Zhou, Jieming and Zhang, Tong and Hayder, Zeeshan and Petersson, Lars and Harandi, Mehrtash},
title = {Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2023},
pages = {2092-2102}
}
Acknowledgement
Our code refers to the following repositories.
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
