SkillAgentSearch skills...

FDTA

The official repository of the paper "From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking"

Install / Use

/learn @Spongebobbbbbbbb/FDTA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

[CVPR 2026]From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking

Arxiv HuggingFace

TL;DR. We reveal that DETR-based end-to-end MOT suffers from overly similar object embeddings. FDTA explicitly enhances discriminativeness in this paradigm.

Teaser

📢 News

  • [2026] Our paper has been accepted by CVPR 2026! The code is now released. Please star this repo for updates!

🚀 Getting Started

1. Environment Setup

conda create -n FDTA python=3.12
conda activate FDTA
pip install -r requirements.txt
# Compile the Deformable Attention operator:
cd models/ops/
sh make.sh

2. Data Preparation

Please organize your datasets under ./datasets/ in the following structure. Note that depth maps are required alongside the RGB images:

datasets/
├── DanceTrack/
│   ├── train/
│   │   └── <sequence>/
│   │       ├── img1/
│   │       ├── gt/
│   │       └── depth/
│   ├── val/
│   └── test/
├── SportsMOT/
│   ├── train/
│   │   └── <sequence>/
│   │       ├── img1/
│   │       ├── gt/
│   │       └── depth/
│   ├── val/
│   └── test/
└── BFT/
    ├── train/
    │   └── <sequence>/
    │       ├── img1/
    │       ├── gt/
    │       └── depth/
    └── test/

Dataset sources:

3. Pre-trained Weights

We use COCO pre-trained Deformable DETR weights for initialization, sourced from MOTIP. Download the weights and place them under ./pretrains/:

| File | Description | Download | |------|-------------|----------| | r50_deformable_detr_coco.pth | COCO pre-trained (base) | link | | r50_deformable_detr_coco_dancetrack.pth | Fine-tuned on DanceTrack | link | | r50_deformable_detr_coco_sportsmot.pth | Fine-tuned on SportsMOT | link | | r50_deformable_detr_coco_bft.pth | Fine-tuned on BFT | link |

Our trained FDTA model weights are available on HuggingFace. Download and place them under ./checkpoints/ for inference.

4. Training

All training configurations are stored in the ./configs/ folder. For example, to train on DanceTrack:

accelerate launch --num_processes=4 train.py \
  --data-root /path/to/your/datasets/ \
  --exp-name fdta_dancetrack \
  --config-path ./configs/dancetrack.yaml \
  --detr-pretrain ./pretrains/r50_deformable_detr_coco_dancetrack.pth

Replace dancetrack with sportsmot or bft to train on other datasets.

Note: If your GPU memory is less than 24GB, you can set --detr-num-checkpoint-frames 2 (< 16GB) or --detr-num-checkpoint-frames 1 (< 12GB) to reduce memory usage.

5. Inference

We support two inference modes:

  • submit: Generate tracker files for submission .
  • evaluate: Generate tracker files and compute evaluation metrics.
accelerate launch --num_processes=4 submit_and_evaluate.py \
  --data-root /path/to/your/datasets/ \
  --inference-mode evaluate \
  --config-path ./configs/dancetrack.yaml \
  --inference-model ./checkpoints/dancetrack.pth \
  --outputs-dir ./outputs/ \
  --inference-dataset DanceTrack \
  --inference-split val

accelerate launch --num_processes=4 submit_and_evaluate.py \
  --data-root /path/to/your/datasets/ \
  --inference-mode submit \
  --config-path ./configs/dancetrack.yaml \
  --inference-model ./checkpoints/dancetrack.pth \
  --outputs-dir ./outputs/ \
  --inference-dataset DanceTrack \
  --inference-split test

You can add --inference-dtype FP16 for faster inference with minimal performance loss.


Main Results

DanceTrack

| Training Data | HOTA | IDF1 | AssA | MOTA | DetA | |---------------|------|------|------|------|------| | train | 71.7 | 77.2 | 63.5 | 91.3 | 81.0 | | train+val | 74.4 | 80.0 | 67.0 | 92.2 | 82.7 |

SportsMOT

| Training Data | HOTA | IDF1 | AssA | MOTA | DetA | |---------------|------|------|------|------|------| | train | 74.2 | 78.5 | 65.5 | 93.0 | 84.1 |

BFT

| Training Data | HOTA | IDF1 | AssA | MOTA | DetA | |---------------|------|------|------|------|------| | train | 72.2 | 84.2 | 74.5 | 78.2 | 70.1 |

Acknowledgements

The code is built on top of these awesome repositories. We thank the authors for opensourcing their code.

Citation

If you find our work useful for your research, please consider citing:

@article{shao2025fdta,
  title={From Detection to Association: Learning Discriminative Object Embeddings for Multi-Object Tracking},
  author={Shao, Yuqing and Yang, Yuchen and Yu, Rui and Li, Weilong and Guo, Xu and Yan, Huaicheng and Wang, Wei and Sun, Xiao},
  journal={arXiv preprint arXiv:2512.02392},
  year={2025}
}

Related Skills

View on GitHub
GitHub Stars60
CategoryEducation
Updated5d ago
Forks2

Languages

Python

Security Score

100/100

Audited on Apr 1, 2026

No findings