DEKR

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Generate Convert Improve

Install / Use

/learn @HRNet/DEKR

About this skill

Quality Score

0/100

README

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

Introduction

In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.

We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint regression is spatially more accurate. We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods and achieves superior bottom-up pose estimation results on two benchmark datasets, COCO and CrowdPose.

Main Results

Results on COCO val2017 without multi-scale test

| Backbone | Input size | #Params | GFLOPs | AP | AP .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | |--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------| | pose_hrnet_w32 | 512x512 | 29.6M | 45.4 | 0.680 | 0.867 | 0.745 | 0.621 | 0.777 | 0.730 | 0.898 | 0.784 | 0.662 | 0.827 | | pose_hrnet_w48 | 640x640 | 65.7M | 141.5 | 0.710 | 0.883 | 0.774 | 0.667 | 0.785 | 0.760 | 0.914 | 0.815 | 0.706 | 0.840 |

Results on COCO val2017 with multi-scale test

| Backbone | Input size | #Params | GFLOPs | AP | AP .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | |--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------| | pose_hrnet_w32 | 512x512 | 29.6M | 45.4 | 0.707 | 0.877 | 0.771 | 0.662 | 0.778 | 0.759 | 0.913 | 0.813 | 0.705 | 0.836 | | pose_hrnet_w48 | 640x640 | 65.7M | 141.5 | 0.723 | 0.883 | 0.786 | 0.686 | 0.786 | 0.777 | 0.924 | 0.832 | 0.728 | 0.849 |

Results on COCO test-dev2017 without multi-scale test

| Backbone | Input size | #Params | GFLOPs | AP | AP .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | |--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------| | pose_hrnet_w32 | 512x512 | 29.6M | 45.4 | 0.673 | 0.879 | 0.741 | 0.615 | 0.761 | 0.724 | 0.908 | 0.782 | 0.654 | 0.819 | | pose_hrnet_w48 | 640x640 | 65.7M | 141.5 | 0.700 | 0.894 | 0.773 | 0.657 | 0.769 | 0.754 | 0.927 | 0.816 | 0.697 | 0.832 |

Results on COCO test-dev2017 with multi-scale test

| Backbone | Input size | #Params | GFLOPs | AP | AP .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | |--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------| | pose_hrnet_w32 | 512x512 | 29.6M | 45.4 | 0.698 | 0.890 | 0.766 | 0.652 | 0.765 | 0.751 | 0.924 | 0.811 | 0.695 | 0.828 | | pose_hrnet_w48 | 640x640 | 65.7M | 141.5 | 0.710 | 0.892 | 0.780 | 0.671 | 0.769 | 0.767 | 0.932 | 0.830 | 0.715 | 0.839 |

Results on CrowdPose test without multi-scale test

| Method | AP | AP .5 | AP .75 | AP (E) | AP (M) | AP (H) | |--------------------|-------|-------|--------|--------|--------|--------| | pose_hrnet_w32 | 0.657 | 0.857 | 0.704 | 0.730 | 0.664 | 0.575 | | pose_hrnet_w48 | 0.673 | 0.864 | 0.722 | 0.746 | 0.681 | 0.587 |

Results on CrowdPose test with multi-scale test

| Method | AP | AP .5 | AP .75 | AP (E) | AP (M) | AP (H) | |--------------------|-------|-------|--------|--------|--------|--------| | pose_hrnet_w32 | 0.670 | 0.854 | 0.724 | 0.755 | 0.680 | 0.569 | | pose_hrnet_w48 | 0.680 | 0.855 | 0.734 | 0.766 | 0.688 | 0.584 |

Results with matching regression results to the closest keypoints detected from the keypoint heatmaps

| | DEKR-w32-SS | DEKR-w32-MS | DEKR-w48-SS | DEKR-w48-MS | |--------------------|-------|-------|--------|--------| | coco_val2017 | 0.680 | 0.710 | 0.710 | 0.728 | | coco_test-dev2017 | 0.673 | 0.702 | 0.701 | 0.714 | | crowdpose_test | 0.655 | 0.675 | 0.670 | 0.683 |

Note:

Flip test is used.
GFLOPs is for convolution and linear layers only.

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA V100 GPU cards for HRNet-w32 and 8 NVIDIA V100 GPU cards for HRNet-w48. Other platforms are not fully tested.

Quick start

Installation

Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
Install dependencies:
```
pip install -r requirements.txt
```

Install COCOAPI:

# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python3 setup.py install --user

Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

Install CrowdPoseAPI exactly the same as COCOAPI.

Init output(training model output directory) and log(tensorboard log directory) directory:

mkdir output 
mkdir log

Your directory tree should look like this:

${POSE_ROOT}
├── data
├── model
├── experiments
├── lib
├── tools 
├── log
├── output
├── README.md
├── requirements.txt
└── setup.py

Download pretrained models and our well-trained models from zoo(OneDrive) and make models directory look like this:

${POSE_ROOT}
|-- model
`-- |-- imagenet
    |   |-- hrnet_w32-36af842e.pth
    |   `-- hrnetv2_w48_imagenet_pretrained.pth
    |-- pose_coco
    |   |-- pose_dekr_hrnetw32_coco.pth
    |   `-- pose_dekr_hrnetw48_coco.pth
    |-- pose_crowdpose
    |   |-- pose_dekr_hrnetw32_crowdpose.pth
    |   `-- pose_dekr_hrnetw48_crowdpose.pth
    `-- rescore
        |-- final_rescore_coco_kpt.pth
        `-- final_rescore_crowd_pose_kpt.pth

Data preparation

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017.zip
            `-- val2017.zip

For CrowdPose data, please download from CrowdPose download, Train/Val is needed for CrowdPose keypoints training. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- crowdpose
    `-- |-- json
        |   |-- crowdpose_train.json
        |   |-- crowdpose_val.json
        |   |-- crowdpose_trainval.json (generated by tools/crowdpose_concat_train_val.py)
        |   `-- crowdpose_test.json
        `-- images.zip

After downloading data, run python tools/crowdpose_concat_train_val.py under ${POSE_ROOT} to create trainval set.

Training and Testing

Testing on COCO val2017 dataset without multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth

Testing on COCO test-dev2017 dataset without multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth \ 
    DATASET.TEST test-dev2017

Testing on COCO val2017 dataset with multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth \ 
    TEST.NMS_THRE 0.15 \
    TEST.SCALE_FACTOR 0.5,1,2

Testing on COCO val2017 dataset with matching regression results to the closest keypoints detected from the keypoint heatmaps

python tools/valid.py \
    --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml \
    TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth \ 
    TEST.MATCH_HMP True

Testing on crowdpose test dataset without multi-scale test using well-trained pose model

python tools/valid.py \
    --cfg experiments/crowdpose/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_crowdpose_x300.yaml \
    TEST.MODEL_FILE model/pose_crowdpose/pose_dekr_hrnetw32_crowdpose.pth

Testing on crowdpose test dataset with multi-scale test using well-trained pose model

python tools/valid.py \

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。