DetectAndTrack

The implementation of an algorithm presented in the CVPR18 paper: "Detect-and-Track: Efficient Pose Estimation in Videos"

Generate Convert Improve

Install / Use

/learn @facebookresearch/DetectAndTrack

About this skill

Quality Score

0/100

README

Detect And Track: Efficient Pose Estimation in Videos

Eg1 Eg2

<p><img src="https://rohitgirdhar.github.io/DetectAndTrack/assets/cup.png" width="50px" align="center" /> Ranked <b>first</b> in the keypoint tracking task of the <a href="https://posetrack.net/leaderboard.php">ICCV 2017 PoseTrack challenge</a>! (entry: ProTracker)</p>

[project page] [paper]

If this code helps with your work, please cite:

R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri and D. Tran. Detect-and-Track: Efficient Pose Estimation in Videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

@inproceedings{girdhar2018detecttrack,
    title = {{Detect-and-Track: Efficient Pose Estimation in Videos}},
    author = {Girdhar, Rohit and Gkioxari, Georgia and Torresani, Lorenzo and Paluri, Manohar and Tran, Du},
    booktitle = {CVPR},
    year = 2018
}

Requirements

This code was developed and tested on NVIDIA P100 (16GB), M40 (12GB) and 1080Ti (11GB) GPUs. Training requires at least 4 GPUs for most configurations, and some were trained with 8 GPUs. It might be possible to train on a single GPU by scaling down the learning rate and scaling up the iteration schedule, but we have not tested all possible setups. Testing can be done on a single GPU. Unfortunately it is currently not possible to run this on a CPU as some ops do not have CPU implementations.

Installation

If you have used Detectron, you should have most of the prerequisites installed, except some required for PoseTrack evaluation. In any case, the following instructions should get you started. I would strongly recommend using anaconda, as it makes it really easy to install most libraries required to compile caffe2 and other ops. First start by cloning this code:

$ git clone https://github.com/facebookresearch/DetectAndTrack.git
$ cd DetectAndTrack

Pre-requisites and software setup

The code was tested with the following setup:

CentOS 6.5
Anaconda (python 2.7)
OpenCV 3.4.1
GCC 4.9
CUDA 9.0
cuDNN 7.1.2
numpy 1.14.2 (needs >=1.12.1, for the poseval evaluation scripts)

The all_pkg_versions.txt file contains the exact versions of packages that should work with this code. To avoid conflicting packages, I would suggest creating a new environment in conda, and installing all the requirements in there. It can be done by:

$ export ENV_NAME="detect_and_track"  # or any other name you prefer
$ conda create --name $ENV_NAME --file all_pkg_versions.txt python=2.7 anaconda
$ source activate $ENV_NAME

If you are using an old OS (like CentOS 6.5), you might want to install versions of packages compatible with the GLIBC library on your system. On my system with GLIBC 2.12, using libraries from the conda-forge channel seemed to work fine. To use it, simply change the conda create command by adding a -c conda-forge.

Install Caffe2

Follow the instructions from the caffe2 installation instructions. I describe what worked for me on CentOS 6.5 next. The code was tested with b4e158 commit release of C2.

$ cd ..
$ git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2
$ git submodule update --init
$ mkdir build && cd build
$ export CONDA_PATH=/path/to/anaconda2  # Set this path as per your anaconda installation
$ export CONDA_ENV_PATH=$CONDA_PATH/envs/$ENV_NAME
$ cmake \
	-DCMAKE_PREFIX_PATH=$CONDA_ENV_PATH \
	-DCMAKE_INSTALL_PREFIX=$CONDA_ENV_PATH \
	-Dpybind11_INCLUDE_DIR=$CONDA_ENV_PATH/include \
	-DCMAKE_THREAD_LIBS_INIT=$CONDA_ENV_PATH/lib ..
$ make -j32
$ make install -j32  # This installs into the environment

This should install caffe2 on your anaconda. Please refer to the official caffe2 installation instructions for more information and help.

Compile some custom ops

We need one additional op for running the 3D models, and is provided as lib/ops/affine_channel_nd_op.*. It can be installed following instructions from here, or:

$ cd ../DetectAndTrack/lib
$ make && make ops
$ cd ..
$ python tests/test_zero_even_op.py  # test that compilation worked

In case this does not work, an alternative is to copy over the lib/ops/affine_channel_nd_op.* files into the caffe2 detectron module folder (caffe2/modules/detectron/), and recompiling caffe2. This would also make this additional op available to caffe2.

Install the COCO API

Since the datasets are represented using COCO conventions in Detectron code base, we need the COCO API to be able to read the train/test files. It can be installed by:

$ # COCOAPI=/path/to/clone/cocoapi
$ git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
$ cd $COCOAPI/PythonAPI
$ # Install into global site-packages
$ make install
$ # Alternatively, if you do not have permissions or prefer
$ # not to install the COCO API into global site-packages
$ python2 setup.py install --user

Dataset and Evaluation

We use a slightly modified version of the PoseTrack dataset where we rename the frames to follow %08d format, with first frame indexed as 1 (i.e. 00000001.jpg). Download and extract the data from PoseTrack download page into lib/datasets/data/PoseTrack (or create a symlink to this location). Then, rename the frames for each video to be named as described above, or use tools/gen_posetrack_json.py, which converts the data and generates labels in the JSON format compatible with Detectron. We already provide the corresponding training/validation/testing JSON files in lib/datasets/lists/PoseTrack/v1.0, which have already been converted to the COCO format. The paths to the data are hardcoded in lib/datasets/json_dataset.py file.

Evaluation is performed using the official PoseTrack evaluation code, poseval, which uses py-motmetrics internally. This code includes a modified version of poseval with multi-processing for faster results. We have verified that the number from this code matches what we get from the evaluation server. Since evaluation is done using provided code, we also need the provided MAT/JSON files with labels which are used by this code to compute the final number. You can download these files from here, and extract them as lib/datasets/data/PoseTrackV1.0_Annots_val_json.

NOTE: Extract the val files into a fast local disk. For some reason, I am seeing slightly different performance if these files are stored on a NFS directory. This might be an issue with the evaluation code (the organizers also found slightly different numbers using their code locally and on the evaluation server), but since the difference is pretty marginal (~0.1% overall MOTA), I am ignoring it for now. When storing the val files on a fast local disk, I can exactly reproduce the performance reported in the paper. However on any disk, the trends should remain the same, with only minor variations in the absolute numbers.

Running the code

We provide a nifty little script launch.py that can take care of running any train/test/tracking workflows. Similar to Detectron, each experiment is completely defined by a YAML config file. We provide the config files required to reproduce our reported performance in the paper. In general, the script can be used as follows:

$ export CUDA_VISIBLE_DEVICES="0,1,2,3"  # set the subset of GPUs on the current node to use. Count must be same as NUM_GPUS set in the config
$ python launch.py --cfg/-c /path/to/config.yaml --mode/-m [train (default)/test/track/eval] ...[other config opts]...

The /path/to/config.yaml is the path to a YAML file with the experiment configuration (see config directory for some examples). mode defines whether you want to run training/testing/tracking/evaluation, and other config opts refer to any other config option (see lib/core/config.py for the full list). This command line config option has the highest precedence, so it will override any defaults or specifications in the YAML file, making it a quick way to experiment with specific configs. We show examples in the following sections.

Before starting, create an empty outputs/ folder in the root directory. This can also be sym-linked to some large disk, as we will be storing all output models, files into this directory. The naming convention will be outputs/path/to/config/file.yaml/, and will contain .pkl model files, detection files etc. For ease of use, the training code will automatically run testing, which automatically runs tracking, which in turn automatically runs evaluation and produces the final performance.

Running tracking and evaluating pre-trained, pre-tested models

We provide pre-trained models and files in a directory here. You can optionally download the whole directory as pretrained_models/ in the root directory, or can download individual models you end up needing.

First, lets start by simply running tracking and evaluating the performance of our best models (that won the Pos

Related Skills

docs-writer

99.4k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

340.2k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

living-review

27 OpenClaw skills for academic research teams — literature reviews, hypothesis versioning, grant writing, lab knowledge handoffs, and more.