FairMOT
[IJCV-2021] FairMOT: On the Fairness of Detection and Re-Identification in Multi-Object Tracking
Install / Use
/learn @ifzhang/FairMOTREADME
FairMOT
A simple baseline for one-shot multi-object tracking:

FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking,
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu,
IJCV2021 (arXiv 2004.01888)
Abstract
There has been remarkable progress on object detection and re-identification in recent years which are the core components for multi-object tracking. However, little attention has been focused on accomplishing the two tasks in a single network to improve the inference speed. The initial attempts along this path ended up with degraded results mainly because the re-identification branch is not appropriately learned. In this work, we study the essential reasons behind the failure, and accordingly present a simple baseline to addresses the problems. It remarkably outperforms the state-of-the-arts on the MOT challenge datasets at 30 FPS. We hope this baseline could inspire and help evaluate new ideas in this field.
News
- (2021.08.03) Our paper is accepted by IJCV!
- (2021.06.01) A nice re-implementation by Baidu PaddleDetection!
- (2021.05.24) A light version of FairMOT using yolov5s backbone is released!
- (2020.09.10) A new version of FairMOT is released! (73.7 MOTA on MOT17)
Main updates
- We pretrain FairMOT on the CrowdHuman dataset using a weakly-supervised learning approach.
- To detect bounding boxes outside the image, we use left, top, right and bottom (4 channel) to replace the WH head (2 channel).
Tracking performance
Results on MOT challenge test set
| Dataset | MOTA | IDF1 | IDS | MT | ML | FPS | |--------------|-----------|--------|-------|----------|----------|--------| |2DMOT15 | 60.6 | 64.7 | 591 | 47.6% | 11.0% | 30.5 | |MOT16 | 74.9 | 72.8 | 1074 | 44.7% | 15.9% | 25.9 | |MOT17 | 73.7 | 72.3 | 3303 | 43.2% | 17.3% | 25.9 | |MOT20 | 61.8 | 67.3 | 5243 | 68.8% | 7.6% | 13.2 |
All of the results are obtained on the MOT challenge evaluation server under the “private detector” protocol. We rank first among all the trackers on 2DMOT15, MOT16, MOT17 and MOT20. The tracking speed of the entire system can reach up to 30 FPS.
Video demos on MOT challenge test set
<img src="assets/MOT15.gif" width="400"/> <img src="assets/MOT16.gif" width="400"/> <img src="assets/MOT17.gif" width="400"/> <img src="assets/MOT20.gif" width="400"/>
Installation
- Clone this repo, and we'll call the directory that you cloned as ${FAIRMOT_ROOT}
- Install dependencies. We use python 3.8 and pytorch >= 1.7.0
conda create -n FairMOT
conda activate FairMOT
conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch
cd ${FAIRMOT_ROOT}
pip install cython
pip install -r requirements.txt
- We use DCNv2_pytorch_1.7 in our backbone network (pytorch_1.7 branch). Previous versions can be found in DCNv2.
git clone -b pytorch_1.7 https://github.com/ifzhang/DCNv2.git
cd DCNv2
./make.sh
- In order to run the code for demos, you also need to install ffmpeg.
Data preparation
- CrowdHuman The CrowdHuman dataset can be downloaded from their official webpage. After downloading, you should prepare the data in the following structure:
crowdhuman
|——————images
| └——————train
| └——————val
└——————labels_with_ids
| └——————train(empty)
| └——————val(empty)
└------annotation_train.odgt
└------annotation_val.odgt
If you want to pretrain on CrowdHuman (we train Re-ID on CrowdHuman), you can change the paths in src/gen_labels_crowd_id.py and run:
cd src
python gen_labels_crowd_id.py
If you want to add CrowdHuman to the MIX dataset (we do not train Re-ID on CrowdHuman), you can change the paths in src/gen_labels_crowd_det.py and run:
cd src
python gen_labels_crowd_det.py
- MIX We use the same training data as JDE in this part and we call it "MIX". Please refer to their DATA ZOO to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
- 2DMOT15 and MOT20 2DMOT15 and MOT20 can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure:
MOT15
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train(empty)
MOT20
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train(empty)
Then, you can change the seq_root and label_root in src/gen_labels_15.py and src/gen_labels_20.py and run:
cd src
python gen_labels_15.py
python gen_labels_20.py
to generate the labels of 2DMOT15 and MOT20. The seqinfo.ini files of 2DMOT15 can be downloaded here [Google], [Baidu],code:8o0w.
Pretrained models and baseline model
- Pretrained models
DLA-34 COCO pretrained model: DLA-34 official. HRNetV2 ImageNet pretrained model: HRNetV2-W18 official, HRNetV2-W32 official. After downloading, you should put the pretrained models in the following structure:
${FAIRMOT_ROOT}
└——————models
└——————ctdet_coco_dla_2x.pth
└——————hrnetv2_w32_imagenet_pretrained.pth
└——————hrnetv2_w18_imagenet_pretrained.pth
- Baseline model
Our baseline FairMOT model (DLA-34 backbone) is pretrained on the CrowdHuman for 60 epochs with the self-supervised learning approach and then trained on the MIX dataset for 30 epochs. The models can be downloaded here: crowdhuman_dla34.pth [Google] [Baidu, code:ggzx ] [Onedrive]. fairmot_dla34.pth [Google] [Baidu, code:uouv] [Onedrive]. (This is the model we get 73.7 MOTA on the MOT17 test set. ) After downloading, you should put the baseline model in the following structure:
${FAIRMOT_ROOT}
└——————models
└——————fairmot_dla34.pth
└——————...
Training
- Download the training data
- Change the dataset root directory 'root' in src/lib/cfg/data.json and 'data_dir' in src/lib/opts.py
- Pretrain on CrowdHuman and train on MIX:
sh experiments/crowdhuman_dla34.sh
sh experiments/mix_ft_ch_dla34.sh
- Only train on MIX:
sh experiments/mix_dla34.sh
- Only train on MOT17:
sh experiments/mot17_dla34.sh
- Finetune on 2DMOT15 using the baseline model:
sh experiments/mot15_ft_mix_dla34.sh
- Train on MOT20: The data annotation of MOT20 is a little different from MOT17, the coordinates of the bounding boxes are all inside the image, so we need to uncomment line 313 to 316 in the dataset file src/lib/datasets/dataset/jde.py:
#np.clip(xy[:, 0], 0, width, out=xy[:, 0])
#np.clip(xy[:, 2], 0, width, out=xy[:, 2])
#np.clip(xy[:, 1], 0, height, out=xy[:, 1])
#np.clip(xy[:, 3], 0, height, out=xy[:, 3])
Then, we can train on the mix dataset and finetune on MOT20:
sh experiments/crowdhuman_dla34.sh
sh experiments/mix_ft_ch_dla34.sh
sh experiments/mot20_ft_mix_dla34.sh
The MOT20 model 'mot20_fairmot.pth' can be downloaded here: [Google] [Baidu, code:jmce].
- For ablation study, we use MIX and half of MOT17 as training data, you can use different backbones such as ResNet, ResNet-FPN, HRNet and DLA::
sh experiments/mix_mot17_half_dla34.sh
sh experiments/mix_mot17_half_hrnet18.sh
sh experiment
