SkillAgentSearch skills...

DeciWatch

[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Install / Use

/learn @cure-lab/DeciWatch

README

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation (ECCV 2022)

This repo is the official implementation of "DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation". [Paper] [Project]

Update

  • [x] Add failure cases and more analyses in project page

  • [x] Provide different sample interval checkpoints

  • [x] Support DeciWatch in MMHuman3D Release v0.7.0 as a 10x speed up strategy!

  • [x] Clean version is released! It currently includes code, data, log and models for the following tasks:

  • 2D human pose estimation

  • 3D human pose estimation

  • Body recovery via a SMPL model

TODO

  • [ ] Support DeciWatch in MMPose

Description

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation, body mesh recovery tasks and efficient labeling in videos with four datasets validate the efficiency and effectiveness of DeciWatch.

Major Features

  • Model training and evaluation for 2D pose, 3D pose, and SMPL body representation
  • Supporting four popular datasets (Human3.6M, 3DPW, AIST++, Sub-JHMDB) and providing cleaned data of five popular pose estimation backbones(FCN, SPIN, EFT, PARE, SimplePose)
  • Versatile visualization toolbox with comparision of input (backbone estimator results) and output(DeciWatch results)

Visualize 2D poses on Sub-JHMDB dataset: visualize of Sub-JHMDB 2D Simplepose

Visualize 3D poses on AIST++ dataset: visualize of AIST++ 3D SPIN

Visualize SMPL on 3DPW dataset: visualize of 3DPW SMPL Pare

Getting Started

Environment Requirement

DeciWatch has been implemented and tested on Pytorch 1.10.1 with python >= 3.6. It supports both GPU and CPU inference.

Clone the repo:

git clone https://github.com/cure-lab/DeciWatch.git

We recommend you install the requirements using conda:

# conda
source scripts/install_conda.sh

Prepare Data

All the data used in our experiment can be downloaded here.

Google Drive

Baidu Netdisk

Valid data includes:

| Dataset | Pose Estimator | 3D Pose | 2D Pose | SMPL | | ---- | ---- | ---- | ---- | ---- | | Sub-JHMDB | SimplePose | | ✔ | | | 3DPW | EFT | ✔ | | ✔ | | 3DPW | PARE | ✔ | | ✔ | | 3DPW | SPIN | ✔ | | ✔ | | Human3.6M | FCN | ✔ | | | | AIST++ | SPIN | ✔ | | ✔ |

Please refer to doc/data.md for detailed data information and data preparing.

Training

Note that the training and testing datasets should be downloaded and prepared before training.

You may refer to doc/training.md for more training details.

Run the commands below to start training:

python train.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

For example, you can train on 3D position representation of the 3DPW dataset using the backbone estimator SPIN with a sample interval N=10 (sampling ratio=10%) by:

python train.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Evaluation (Take a 10% sampling ratio as an example)

Noted that although our main contribution is the high efficiency improvement, using DeciWatch as post processing is also helpful for accuracy and smoothness improvement.

You may refer to doc/evaluate.md for evaluate details on all sampling ratios.

Results on 2D Pose:

| Dataset | Estimator | PCK 0.05 (Input/Output):arrow_up: | PCK 0.1 (Input/Output):arrow_up: | PCK 0.2 (Input/Output):arrow_up: | Checkpoint | | ------- | --------- | -------------------- | ------------------ | ------------- |------------- | | Sub-JHMDB | simplepose | 57.30%/79.44% |81.61%/94.05% | 93.94%/98.75% |Baidu Netdisk / Google Drive|

Results on 3D Pose:

| Dataset | Estimator | MPJPE (Input/Output):arrow_down: | Accel (Input/Output):arrow_down: | Checkpoint | | ------- | --------- | ------------------ | ------------------ | -------- | | 3DPW | SPIN | 96.92/93.34 | 34.68/7.06 |Baidu Netdisk / Google Drive| | 3DPW | EFT | 90.34/89.02 | 32.83/6.84 | Baidu Netdisk / Google Drive | | 3DPW | PARE | 78.98/77.16 | 25.75/6.90 |Baidu Netdisk / Google Drive | | AIST++ | SPIN | 107.26/71.27 | 33.37/5.68 | Baidu Netdisk / Google Drive |
| Human3.6M | FCN | 54.56/52.83 | 19.18/1.47 | Baidu Netdisk / Google Drive |

Results on SMPL-based Body Recovery:

| Dataset | Estimator | MPJPE (Input/Output):arrow_down:| Accel (Input/Output):arrow_down:| MPVPE (Input/Output):arrow_down: | Checkpoint | | ------- | --------- | ------------------ | ------------------ | ------------------ | ------ | | 3DPW | SPIN | 100.13/97.53 | 35.53/8.38 | 114.39/112.84 | Baidu Netdisk / Google Drive | | 3DPW | EFT | 91.60/92.56 | 33.57/8.75 | 110.34/109.27 |Baidu Netdisk / Google Drive| | 3DPW | PARE | 80.44/81.76 | 26.77/7.24 |94.88/95.68 | Baidu Netdisk / Google Drive | | AIST++ | SPIN | 108.25/82.10 | 33.83/7.27 | 137.51/106.08 | Baidu Netdisk / Google Drive |

Quick Demo

Here, we only provide demo visualization based on offline processed detected poses of specific datasets(e.g. AIST++, Human3.6M, 3DPW, and Sub-JHMDB). To visualize on arbitrary given video, please refer to the inference/demo of MMHuman3D.

Run the commands below to visualize demo:

python demo.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

You are supposed to put corresponding images with the data structure:

|-- data
    |-- videos
        |-- pw3d 
            |-- downtown_enterShop_00
                |-- image_00000.jpg
   
View on GitHub
GitHub Stars190
CategoryEducation
Updated9h ago
Forks15

Languages

Python

Security Score

100/100

Audited on Mar 23, 2026

No findings