MVT
[BMVC 2023] Mobile Vision Transformer-based Visual Object Tracking
Install / Use
/learn @goutamyg/MVTREADME
Mobile Vision Transformer-based Visual Object Tracking [BMVC2023] official implementation

News
11-03-2024: C++ implementation of our tracker is available now
10-11-2023: ONNX-Runtime and TensorRT-based inference code is released. Now, our MVT runs at ~70 fps on CPU and ~300 fps on GPU :zap::zap:. Check the page for details.
14-09-2023: The pretrained tracker model is released
13-09-2023: The paper is available on arXiv now
22-08-2023: The MVT tracker training and inference code is released
21-08-2023: The paper is accepted at BMVC2023
Installation
Install the dependency packages using the environment file mvt_pyenv.yml.
Generate the relevant files:
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
After running this command, modify the datasets paths by editing these files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
Training
- Set the path of training datasets in
lib/train/admin/local.py - Place the pretrained backbone model under the
pretrained_models/folder - For data preparation, please refer to this
- Uncomment lines
63, 67, and 71in the base_backbone.py file. Replace these lines withself.z_dict1 = template.tensors. - Run
python tracking/train.py --script mobilevit_track --config mobilevit_256_128x1_got10k_ep100_cosine_annealing --save_dir ./output --mode single
- The training logs will be saved under
output/logs/folder
Pretrained tracker model
The pretrained tracker model can be found here
Tracker Evaluation
- Update the test dataset paths in
lib/test/evaluation/local.py - Place the pretrained tracker model under
output/checkpoints/folder - Run
python tracking/test.py --tracker_name mobilevit_track --tracker_param mobilevit_256_128x1_got10k_ep100_cosine_annealing --dataset got10k_test/trackingnet/lasot
- Change the
DEVICEvariable betweencudaandcpuin the--tracker_paramfile for GPU and CPU-based inference, respectively - The raw results will be stored under
output/test/folder
Profile tracker model
- To count the model parameters, run
python tracking/profile_model.py
Acknowledgements
- We use the Separable Self-Attention Transformer implementation and the pretrained
MobileViTbackbone from ml-cvnets. Thank you! - Our training code is built upon OSTrack and PyTracking
Citation
If our work is useful for your research, please consider citing:
@inproceedings{Gopal_2023_BMVC,
author = {Goutam Yelluru Gopal and Maria Amer},
title = {Mobile Vision Transformer-based Visual Object Tracking},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year = {2023},
url = {https://papers.bmvc2023.org/0800.pdf}
}
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
