SOTS
Single object tracking and segmentation.
Install / Use
/learn @JudasDie/SOTSREADME
Single/Multiple Object Tracking and Segmentation
Codes and comparison of recent single/multiple object tracking and segmentation.
News
:boom: VLT_SCAR/VLT_TT is accepted by NeurIPS2022.
:boom: CNNInMo/TransInMo is accepted by IJCAI2022.
:boom: CSTrack is accepted by IEEE TIP.
:boom: OMC is accepted by AAAI2022. The training and testing code has been released in this codebase.
:boom: AutoMatch is accepted by ICCV2021. The training and testing code has been released in this codebase.
:boom: CSTrack ranks 5/4000 at Tianchi Global AI Competition.
:boom: Ocean is accepted by ECCV2020. [OceanPlus] is accepted by IEEE TIP.
:boom: SiamDW is accepted by CVPR2019 and selected as oral presentation.
<!-- :boom: The improved version of [CSTrack_panda](https://github.com/JudasDie/SOTS/blob/master/lib/tutorial/CSTrack_panda/CSTrack_PANDA.md) has been released, containing the end-to-end tranining codes on PANDA. It is a strong baseline for [Gigavison](http://gigavision.cn/index.html) MOT tracking. Our tracker takes the **5th** place in **Tianchi Global AI Competition (天池—全球人工智能技术创新大赛[赛道二])**, with the score of **A-0.6712/B-0.6251 (AB榜)**, which surprisingly outperforms the baseline tracker JDE with score of A-0.32/B-0.34. More details about CSTrack_panda can be found [here](https://blog.csdn.net/qq_34919792/article/details/116792954?spm=1001.2014.3001.5501). --> <!-- [](https://www.youtube.com/watch?v=zRCRgsrW71s "") -->Supported Trackers (SOT and MOT)
Single-Object Tracking (SOT)
- [x] [NeurIPS2022] VLT_SCAR/VLT_TT
- [x] [IJCAI2022] CNNInMo/TransInMo
- [x] [ICCV2021] AutoMatch
- [x] [ECCV2020] Ocean and Ocean+
- [x] [CVPR2019 Oral] SiamDW
Multi-Object Tracking (MOT)
- [x] [AAAI2022] OMC
- [x] [IEEE TIP] CSTrack
Results Comparison
- [x] Comparison
Branches
- SOT (or master): for our SOT trackers
- MOT: for our MOT trackers
- v0: old codebase supporting
OceanPlusandTensorRT testing.
Please clone the branch to your needs.
Structure
experiments:training and testing settingsdemo:figures for readmedataset:testing datasetdata:training datasetlib:core scripts for all trackerssnapshot:pre-trained modelspretrain:models trained on ImageNet (for training)tracking:training and testing interface
$SOTS
|—— experimnets
|—— lib
|—— snapshot
|—— xxx.model
|—— dataset
|—— VOT2019.json
|—— VOT2019
|—— ants1...
|—— VOT2020
|—— ants1...
|—— ...
Performance
| <sub>Model</br></sub> | <sub>OTB2015</br> </sub> | <sub>GOT10K</br> </sub> | <sub>LaSOT</br> </sub> | <sub>TNL2K</br></sub> | <sub>TrackingNet</br></sub> | <sub>NFS30</br> </sub> | <sub>TOTB</sub> |<sub>VOT2019</sub> |<sub>TC128</sub> |<sub>UAV123</sub> |<sub>LaSOT_Ext</sub> |<sub>OTB-99-LANG</sub> | |:-----:|:-:|:----:|:------:|:--------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | <sub>SiamDW<sub> | <sub>0.670</sub> | <sub>0.429</sub> | <sub>0.386</sub>|<sub>0.348</sub>|<sub>61.1</sub>| <sub>0.521</sub> |<sub>0.500</sub> |<sub>0.241</sub> |<sub>0.583</sub> |<sub>0.536</sub> |<sub>-</sub> |<sub>-</sub> | | <sub>Ocean</sub> | <sub>0.676</sub> | <sub>0.615</sub> | <sub>0.517</sub>|<sub>0.421</sub>|<sub>69.2</sub>| <sub>0.553</sub> |<sub>0.638</sub> |<sub>0.323</sub> |<sub>0.585</sub> |<sub>0.621</sub> |<sub>-</sub> |<sub>-</sub> | | <sub>AutoMatch</sub> | <sub>0.714</sub> | <sub>0.652</sub> | <sub>0.583</sub>|<sub>0.472</sub>|<sub>76.0</sub>| <sub>0.606</sub> |<sub>0.668</sub> |<sub>0.322</sub> |<sub>0.634</sub> |<sub>0.644</sub> |<sub>-</sub> |<sub>-</sub> | | <sub>CNNInMo</sub> | <sub>0.703</sub> | <sub>-</sub> | <sub>0.539</sub>|<sub>0.422</sub>|<sub>72.1</sub>| <sub>0.560</sub> |<sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>0.629</sub> |<sub>-</sub> |<sub>-</sub> | | <sub>TransInMo</sub> | <sub>0.711</sub> | <sub>-</sub> | <sub>0.657</sub>|<sub>0.520</sub>|<sub>81.7</sub>| <sub>0.668</sub> |<sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>0.690</sub> |<sub>-</sub> |<sub>-</sub> | | <sub>VLT_SCAR</sub> | <sub>-</sub> | <sub>0.610</sub> | <sub>0.639</sub>|<sub>0.498</sub>|<sub>-</sub>| <sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>0.447</sub> |<sub>0.739</sub> | | <sub>VLT_TT</sub> | <sub>-</sub> | <sub>0.694</sub> | <sub>0.673</sub>|<sub>0.531</sub>|<sub>-</sub>| <sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>-</sub> |<sub>0.484</sub> |<sub>0.764</sub> |
Tracker Details
VLT_SCAR/VLT_TT [NeurIPS2022]
[Paper] [Raw Results] [Training and Testing Tutorial] <br/> VLT explores a different path to achieve SOTA tracking without complex Transformer, i.e., multimodal Vision-Language tracking. The essence is a unified-adaptive Vision-Language representation, learned by the proposed ModaMixer and asymmetrical networks. The experiments show our approach surprisingly boosts a pure CNN-based Siamese tracker to achieve competitive or even better performances compared to recent SOTAs, which also benefits Transformer-based trackers. We hope that this work inspires more possibilities for future tracking beyond Transformer.
<img src="https://github.com/JudasDie/SOTS/blob/SOT/demo/VLT.jpg" width="700" alt="VLT"/><br/>
CNNInMo/TransInMo [IJCAI2022]
[Paper] [Raw Results] [Training and Testing Tutorial] <br/> CNNInMo/TransInMo introduces a novel mechanism that conducts branch-wise interactions inside the visual tracking backbone network (InBN) via the proposed general interaction modeler (GIM). We show that both CNN and Transformer backbones can benefit from InBN, with which more robust feature representation can be learned. Our method achieves compelling tracking performance by applying the backbones to Siamese tracking.
<img src="https://github.com/JudasDie/SOTS/blob/SOT/demo/TransInMo.jpg" width="700" alt="TransInMo"/><br/>
OMC [AAAI2022]
[Paper] [Training and Testing Tutorial] <br/> OMC introduces a double-check mechanism to make the "fake background" be tracked again. Specifically, we design a re-check network as the auxiliary to initial detections. If the target does not exist in the first-check predictions (i.e., the results of object detector), as a potential misclassified target, it has a chance to be restored by the re-check network, which searches targets through mining temporal cues. Note that, the re-check network innovatively expands the role of ID embedding from data association to motion forecasting by effectively propagating previous tracklets to the current frame with a small overhead. Even with multiple tracklets, our re-check network can still propagate with one forward pass by a simple matrix multiplication. Building on a strong baseline CSTrack, we construct a new one-shot tracker and achieve favorable gains.
<img src="https://github.com/JudasDie/SOTS/blob/MOT/demo/OMC.jpg" height="500" alt="OMC"/><br/>
AutoMatch [ICCV2021]
[Paper] [Raw Results] [Training and Testing Tutorial] [Demo] <br/> AutoMatch replaces the essence of Siamese tracking, i.e. the cross-correlation and its variants, to a learnable matching network. The underlying motivation is that heuristic matching network design relies heavily on expert experience. Moreover, we experimentally find that one sole matching operator is difficult to guarantee stable tracking in all challenging environments. In this work, we introduce six novel matching operators from the perspective of feature fusion instead of explicit similarity learning, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection. The analyses reveal these operators' selective adaptability on different environment d
Related Skills
node-connect
331.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
81.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
331.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
81.5kCommit, push, and open a PR
