Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking [AAAI2025]

Official implementation of STTrack, including models and training&testing codes.

Models & Raw Results(Google Driver) Models & Raw Results(Baidu Driver:9527)

News

[Dec 30, 2024]

We release codes, models and raw results. Thanks for your star.

Introduction

A new unified multimodal spatial-temporal tracking framework (e.g. RGB-D, RGB-T, and RGB-E Tracking).
STTrack excels in multiple multimodal tracking tasks. We hope it will garner more attention for multimodal tracking.

Strong Performance

| Tracker | LasHeR | RGBT234 | VisEvent | DepthTrack | VOT22RGBD| |:-----------:|:------------:|:-----------:|:-----------------:|:-----------:|:-----------:| | STTrack | 60.3 | 66.7 | 61.9 | 77.6 | 63.3 |

Usage

Installation

Create and activate a conda environment:

conda create -n STTrack python=3.8
conda activate STTrack

Install the required packages:

bash install_sttrack.sh

Data Preparation

Put the training datasets in ./data/. It should look like:

$<PATH_of_STTrack>
-- data
    -- DepthTrackTraining
        |-- adapter02_indoor
        |-- bag03_indoor
        |-- bag04_indoor
        ...
    -- LasHeR/train/trainingset
        |-- 1boygo
        |-- 1handsth
        ...
    -- VisEvent/train
        |-- 00142_tank_outdoor2
        |-- 00143_tank_outdoor2
        ...
        |-- trainlist.txt

Path Setting

Run the following command to set paths:

cd <PATH_of_STTrack>
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

You can also modify paths by these two files:

./lib/train/admin/local.py  # paths for training
./lib/test/evaluation/local.py  # paths for testing

Training

Dowmload the pretrained foundation model (OSTrack) and put it under ./pretrained/.

bash train.sh

You can train models with various modalities and variants by modifying train.sh.

Testing

For RGB-D benchmarks

[DepthTrack Test set & VOT22_RGBD]
These two benchmarks are evaluated using VOT-toolkit.
You need to put the DepthTrack test set to./Depthtrack_workspace/ and name it 'sequences'.
You need to download the corresponding test sequences at./vot22_RGBD_workspace/.

bash test_rgbd.sh

For RGB-T benchmarks

[LasHeR & RGBT234]
Modify the <DATASET_PATH> and <SAVE_PATH> in./RGBT_workspace/test_rgbt_mgpus.py, then run:

bash test_rgbt.sh

We refer you to LasHeR Toolkit for LasHeR evaluation, and refer you to MPR_MSR_Evaluation for RGBT234 evaluation.

For RGB-E benchmark

[VisEvent]
Modify the <DATASET_PATH> and <SAVE_PATH> in./RGBE_workspace/test_rgbe_mgpus.py, then run:

bash test_rgbe.sh

We refer you to VisEvent_SOT_Benchmark for evaluation.

Bixtex

If you find STTrack is helpful for your research, please consider citing:

@inproceedings{sttrack,
  title={Exploiting multimodal spatial-temporal patterns for video object tracking},
  author={Hu, Xiantao and Tai, Ying and Zhao, Xu and Zhao, Chen and Zhang, Zhenyu and Li, Jun and Zhong, Bineng and Yang, Jian},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={4},
  pages={3581--3589},
  year={2025}
}

Acknowledgment

This repo is based on OSTrack and ViPT which are excellent works.
We thank for the PyTracking library, which helps us to quickly implement our ideas.

STTrack

Install / Use

README