Diff3DHPE

Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation [R6D 2023] [Official]

Generate Convert Improve

Install / Use

/learn @csiro-icvg/Diff3DHPE

About this skill

Quality Score

0/100

README

Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation [R6D 2023]

<div style="text-align:center"> <img src="assets/Diff3DHPE_MixSTE.png" width="1500" alt="Overall framework of Diff3DHPE during the reverse diffusion process in \textit{seq2seq} style. In the iteration step $t$, a 2D keypoint sequence $\tens{x}$ is concatenated with its corresponding noisy 3D predicted sequence $\hat{\tens{y}}_t$ along the channel dimension as the input $(\tens{x}, \hat{\tens{y}}_t)$. The backbone model takes $(\tens{x}, \hat{\tens{y}}_t)$ and $t$ to predict a final 3D sequence $\hat{\tens{y}}_{0,t}$ at the step $t$. Then, $\hat{\tens{y}}_{t-1}$ is obtained from a predefined reverse diffusion function and sent to the next iteration for refining. To note, the backbone model is MixSTE in this example."/> </div> The Pytroch implementation for <a href="https://openaccess.thecvf.com/content/ICCV2023W/R6D/html/Zhou_Diff3DHPE_A_Diffusion_Model_for_3D_Human_Pose_Estimation_ICCVW_2023_paper.html">"Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation"</a>.

Qualitative and quantitative results

Human3.6M

CPN, 81 frames

| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | PoseFormer | 44.3 | | MixSTE | 42.4 | | P-STMO-S | 44.1 | | Diff3DHPE-MixSTE | 42.0 |

CPN, 243 frames

| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | MixSTE | 40.9 | | P-STMO-S | 42.8 | | Diff3DHPE-MixSTE | 40.0 |

GT, 81 frames

| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | PoseFormer | 31.3 | | MixSTE | 25.9 | | Diff3DHPE-MixSTE | 24.2 |

GT, 243 frames

| Method | MPJPE (mm) | |:--------------------------------------------------:|:----------:| | MixSTE | 21.6 | | P-STMO-S | 29.3 | | Diff3DHPE-MixSTE | 20.2 |

MPI-INF-3DHP

GT

| Method | Frames | PCK (%) | AUC (%) | MPJPE (mm) | |:--------------------------------------------------:|:------:|:-------:|:-------:|:----------:| | PoseFormer | 9 | 88.6 | 56.4 | 77.1 | | MixSTE | 27 | 94.4 | 66.5 | 54.9 | | P-STMO-S | 81 | 97.9 | 75.8 | 32.2 | | Diff3DHPE-MixSTE | 27 | 99.1 | 84.8 | 19.6 |

Environment

Please create the envirmennt by the following command:

conda env create -f Diff3DHPE.yml

Dataset

Please refer to data/README.MD

Experiments

Please refer to Experiments.sh

Pretrained models OneDirve

Visuzalization

Figure

python visualization_fig.py --gpu_id 0 -sviz S9 -a "Photo 1" -cam 1 -s 81 -f 81 -b 1 --sampling_timesteps 9 -c checkpoint/h36m/ConditionalDiffusionMixSTES2SGRANDLinLift/cpn/f81 --evaluate ConditionalDiffusionMixSTES2SGRANDLinLift_l2_lr4e-4_useTembed_T_h36m_cpn_81f.bin --config configs/h36m_cpn_s2s_ConditionalDiffusionMixSTES2SGRANDLinLift.json --viz-video data/Videos/S9/Videos/Photo\ 1.55011271.mp4

Animation

python visualization_ani.py --gpu_id 0 -sviz S9 -a "Photo 1" -cam 1 -s 81 -f 81 -b 4 --sampling_timesteps 9 -c checkpoint/h36m/ConditionalDiffusionMixSTES2SGRANDLinLift/cpn/f81 --evaluate ConditionalDiffusionMixSTES2SGRANDLinLift_l2_lr4e-4_useTembed_T_h36m_cpn_81f.bin --config configs/h36m_cpn_s2s_ConditionalDiffusionMixSTES2SGRANDLinLift.json --viz-video data/Videos/S9/Videos/Photo\ 1.55011271.mp4 --viz-output viz.mp4

Citation

If you find this repo useful, please consider citing our paper:

@InProceedings{Zhou_2023_ICCV,
    author    = {Zhou, Jieming and Zhang, Tong and Hayder, Zeeshan and Petersson, Lars and Harandi, Mehrtash},
    title     = {Diff3DHPE: A Diffusion Model for 3D Human Pose Estimation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2023},
    pages     = {2092-2102}
}

Acknowledgement

Our code refers to the following repositories.

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。