PMFNet
Implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral)
Install / Use
/learn @bobwan1995/PMFNetREADME
Pose-aware Multi-level Feature Network for Human Object Interaction Detection
Official implementation of "Pose-aware Multi-level Feature Network for Human Object Interaction Detection"(ICCV 2019 Oral).
This code follows the implementation architecture of roytseng-tw/mask-rcnn.pytorch.
Getting Started
Requirements
Tested under python3.
- python packages
- pytorch==0.4.1
- torchvision==0.2.2
- pyyaml==3.12
- cython
- matplotlib
- numpy
- scipy
- opencv
- packaging
- ipdb
- pycocotools — for COCO dataset, also available from pip.
- tensorboardX — for logging the losses in Tensorboard
- An NVIDAI GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
Assume the project is located at $ROOT.
Compilation
Compile the NMS code:
cd $ROOT/lib
sh make.sh
Data and Pretrained Model Preparation
Create a data folder under the repo,
cd $ROOT
mkdir data
-
COCO: Download the coco images and annotations from coco website.
Our data: Download the our dataset annotations and detection/keypoint proposals from Google Drive and BaiduYun.
Pose estimatiotn We use the repo pytorch-cpn to train our pose estimator. We have released our keypoint predictions of vcoco dataset on our data.
And make sure to put the files as the following structure:
data ├───coco │ ├─images │ │ ├─train2014 │ │ ├─val2014 │ │ │ ├─vcoco │ ├─annotations │ ├─annotations_with_keypoints │ ├─vcoco │ ├───cache │ ├─addPredPose │ ├───pretrained_model ├─e2e_faster_rcnn_R-50-FPN_1x_step119999.pth ├─vcoco_best_model_on_test.pth
Training
cd $ROOT
sh script/train_vcoco.sh
Test
cd $ROOT
sh script/test_vcoco.sh
Our pretrained model vcoco_best_model_on_test.pth has 52.05 AP on vcoco test set.
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
