SkillAgentSearch skills...

FindTrack

[ICCVW 2025] Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation

Install / Use

/learn @suhwan-cho/FindTrack
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

FindTrack

This is the official PyTorch implementation of our paper:

Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation, ICCVW 2025
Suhwan Cho*, Seunghoon Lee*, Minhyeok Lee, Jungho Lee, Sangyoun Lee
Link: [ICCVW] [arXiv]

<img src="https://github.com/user-attachments/assets/a57cd78a-6f34-4fa7-bddc-762e5e90a71b" width=800>

You can also explore other related works at awesome-video-object segmentation.

Demo Video

https://github.com/user-attachments/assets/e5475442-f2fe-4899-84dd-8ae7ef22a7f2

Abstract

Existing referring VOS methods typically fuse visual and textual features in a highly entangled manner, processing multi-modal information jointly. However, this entanglement often leads to challenges in resolving ambiguous target identification and maintaining consistent mask propagation across frames. To address these issues, we propose a decoupled framework that explicitly separates object identification from mask propagation. The key frame is adaptively selected based on segmentation confidence and vision-text alignment, establishing a reliable anchor for propagation.

Setup

1. Download the datasets: Ref-YouTube-VOS, Ref-DAVIS17, MeViS.

2. Download Alpha-CLIP weights and place it in the weights/ directory.

Running

Training (optional)

FindTrack works well in a training-free manner, but fine-tuning on specific datasets can improve performance further.

For Ref-YouTube-VOS dataset:

deepspeed --num_gpus 4 train_ytvos.py 

For MeViS dataset:

deepspeed --num_gpus 4 train_mevis.py 

Testing

For Ref-YouTube-VOS dataset:

python run_ytvos.py

For MeViS dataset:

python run_mevis.py

Verify the following before running:
✅ Testing dataset selection
✅ GPU availability and configuration
✅ Pre-trained model path

Gradio Demo

You can use the web demo with your own video!

<img src="https://github.com/user-attachments/assets/74eb0778-84dd-4f84-b081-3bfae8de91d7" width=600>

Run the Gradio demo with:

python demo.py

Attachments

Pre-computed results

Contact

Code and models are only available for non-commercial research purposes.
For questions or inquiries, feel free to contact:

E-mail: suhwanx@gmail.com
View on GitHub
GitHub Stars81
CategoryContent
Updated20d ago
Forks11

Languages

Python

Security Score

95/100

Audited on Mar 3, 2026

No findings