PMFNet

Implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral)

Generate Convert Improve

Install / Use

/learn @bobwan1995/PMFNet

About this skill

Quality Score

0/100

README

Pose-aware Multi-level Feature Network for Human Object Interaction Detection

Official implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral).

This code follows the implementation architecture of roytseng-tw/mask-rcnn.pytorch.

Getting Started

Requirements

Tested under python3.

python packages
- pytorch==0.4.1
- torchvision==0.2.2
- pyyaml==3.12
- cython
- matplotlib
- numpy
- scipy
- opencv
- packaging
- ipdb
- pycocotools — for COCO dataset, also available from pip.
- tensorboardX — for logging the losses in Tensorboard
An NVIDAI GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.

Assume the project is located at $ROOT.

Compilation

Compile the NMS code:

cd $ROOT/lib 
sh make.sh

Data and Pretrained Model Preparation

Create a data folder under the repo,

cd $ROOT
mkdir data

COCO: Download the coco images and annotations from coco website.

Our data: Download the our dataset annotations and detection/keypoint proposals from Google Drive and BaiduYun.

Pose estimatiotn We use the repo pytorch-cpn to train our pose estimator. We have released our keypoint predictions of vcoco dataset on our data.

And make sure to put the files as the following structure:

data
├───coco
│   ├─images
│   │  ├─train2014
│   │  ├─val2014 
│   │
│   ├─vcoco
│      ├─annotations
│      ├─annotations_with_keypoints
│      ├─vcoco
│
├───cache
│   ├─addPredPose
│
├───pretrained_model
    ├─e2e_faster_rcnn_R-50-FPN_1x_step119999.pth
    ├─vcoco_best_model_on_test.pth

Training

cd $ROOT
sh script/train_vcoco.sh

Test

cd $ROOT
sh script/test_vcoco.sh

Our pretrained model vcoco_best_model_on_test.pth has 52.05 AP on vcoco test set.

Related Skills

node-connect

349.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。