Single/Multiple Object Tracking and Segmentation

Codes and comparison of recent single/multiple object tracking and segmentation.

News

:boom: VLT_SCAR/VLT_TT is accepted by NeurIPS2022.

:boom: CNNInMo/TransInMo is accepted by IJCAI2022.

:boom: CSTrack is accepted by IEEE TIP.

:boom: OMC is accepted by AAAI2022. The training and testing code has been released in this codebase.

:boom: AutoMatch is accepted by ICCV2021. The training and testing code has been released in this codebase.

:boom: CSTrack ranks 5/4000 at Tianchi Global AI Competition.

:boom: Ocean is accepted by ECCV2020. [OceanPlus] is accepted by IEEE TIP.

:boom: SiamDW is accepted by CVPR2019 and selected as oral presentation.

Supported Trackers (SOT and MOT)

Single-Object Tracking (SOT)

Multi-Object Tracking (MOT)

[x] [AAAI2022] OMC
[x] [IEEE TIP] CSTrack

Results Comparison

[x] Comparison

Branches

SOT (or master): for our SOT trackers
MOT: for our MOT trackers
v0: old codebase supporting OceanPlus and TensorRT testing.

Please clone the branch to your needs.

Structure

experiments: training and testing settings
demo: figures for readme
dataset: testing dataset
data: training dataset
lib: core scripts for all trackers
snapshot: pre-trained models
pretrain: models trained on ImageNet (for training)
tracking: training and testing interface

$SOTS
|—— experimnets
|—— lib
|—— snapshot
  |—— xxx.model
|—— dataset
  |—— VOT2019.json 
  |—— VOT2019
     |—— ants1...
  |—— VOT2020
     |—— ants1...
|—— ...

Performance

| Model | OTB2015 | GOT10K | LaSOT | TNL2K | TrackingNet | NFS30 | TOTB |VOT2019 |TC128 |UAV123 |LaSOT_Ext |OTB-99-LANG | |:-----:|:-:|:----:|:------:|:--------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | SiamDW | 0.670 | 0.429 | 0.386|0.348|61.1| 0.521 |0.500 |0.241 |0.583 |0.536 |- |- | | Ocean | 0.676 | 0.615 | 0.517|0.421|69.2| 0.553 |0.638 |0.323 |0.585 |0.621 |- |- | | AutoMatch | 0.714 | 0.652 | 0.583|0.472|76.0| 0.606 |0.668 |0.322 |0.634 |0.644 |- |- | | CNNInMo | 0.703 | - | 0.539|0.422|72.1| 0.560 |- |- |- |0.629 |- |- | | TransInMo | 0.711 | - | 0.657|0.520|81.7| 0.668 |- |- |- |0.690 |- |- | | VLT_SCAR | - | 0.610 | 0.639|0.498|-| - |- |- |- |- |0.447 |0.739 | | VLT_TT | - | 0.694 | 0.673|0.531|-| - |- |- |- |- |0.484 |0.764 |

Tracker Details

VLT_SCAR/VLT_TT [NeurIPS2022]

[Paper] [Raw Results] [Training and Testing Tutorial] VLT explores a different path to achieve SOTA tracking without complex Transformer, i.e., multimodal Vision-Language tracking. The essence is a unified-adaptive Vision-Language representation, learned by the proposed ModaMixer and asymmetrical networks. The experiments show our approach surprisingly boosts a pure CNN-based Siamese tracker to achieve competitive or even better performances compared to recent SOTAs, which also benefits Transformer-based trackers. We hope that this work inspires more possibilities for future tracking beyond Transformer.

CNNInMo/TransInMo [IJCAI2022]

[Paper] [Raw Results] [Training and Testing Tutorial] CNNInMo/TransInMo introduces a novel mechanism that conducts branch-wise interactions inside the visual tracking backbone network (InBN) via the proposed general interaction modeler (GIM). We show that both CNN and Transformer backbones can benefit from InBN, with which more robust feature representation can be learned. Our method achieves compelling tracking performance by applying the backbones to Siamese tracking.

OMC [AAAI2022]

[Paper] [Training and Testing Tutorial] OMC introduces a double-check mechanism to make the "fake background" be tracked again. Specifically, we design a re-check network as the auxiliary to initial detections. If the target does not exist in the first-check predictions (i.e., the results of object detector), as a potential misclassified target, it has a chance to be restored by the re-check network, which searches targets through mining temporal cues. Note that, the re-check network innovatively expands the role of ID embedding from data association to motion forecasting by effectively propagating previous tracklets to the current frame with a small overhead. Even with multiple tracklets, our re-check network can still propagate with one forward pass by a simple matrix multiplication. Building on a strong baseline CSTrack, we construct a new one-shot tracker and achieve favorable gains.

AutoMatch [ICCV2021]

[Paper] [Raw Results] [Training and Testing Tutorial] [Demo] AutoMatch replaces the essence of Siamese tracking, i.e. the cross-correlation and its variants, to a learnable matching network. The underlying motivation is that heuristic matching network design relies heavily on expert experience. Moreover, we experimentally find that one sole matching operator is difficult to guarantee stable tracking in all challenging environments. In this work, we introduce six novel matching operators from the perspective of feature fusion instead of explicit similarity learning, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection. The analyses reveal these operators' selective adaptability on different environment d

SOTS

Install / Use

README