Lang2SegTrack
This is an open source project that can track and segment specific objects in video streams by manual clicks, box selections, or text prompts.
Install / Use
/learn @wngkj/Lang2SegTrackREADME
Language-to-Segment&Track
Language-driven visual segmentation and object tracking system based on Grounding-DINO and SAMURAI.
🔥 News
2025/07/31: Support continuous detection, tracking, and segmentation of individual category objects through text prompts. Optimize the old data release logic to reduce memory footprint and improve the running speed of interactive inference.2025/07/19: Supports backward tracking of each object.2025/05/08: Optimize memory usage and code structure.2025/04/29: Initial version submission.
🔍 What Can It Do?
For Video Files or Real-Time Video (Camera):
Default camera supported is Intel RealSense. For others, please modify the video capture method manually.
- Allows initializing multiple object tracks via 'bbox prompts', 'point prompts', and 'mask prompts' in the first frame.
- Enables interactive tracking and segmentation on any frame using:
- Mouse click
- Manual box drawing
- Text prompt
- Tracked objects added at any frame can be tracked backwards to the untracked frames after the forward tracking is completed, and finally the complete 'mask trace' of each object is obtained.
- Support continuous detection, tracking, and segmentation of individual category objects through text prompts.
https://github.com/user-attachments/assets/40a340c7-d818-493f-b86a-bb8ed5ca517c
https://github.com/user-attachments/assets/c709131f-be66-4952-b8e2-b95fbab6529d
For Images:
- Accepts a text prompt and returns (Supports batch processing of images):
- Label
- Score
- Bounding Box
- Segmentation Mask

🚀 Get Started
We recommend using Anaconda for environment management.
✅ Prerequisites
- OS: Ubuntu >= 18.04
- NVIDIA Driver (
nvidia-smi) >= 550
🛠 Installation
git clone https://github.com/wngkj/Lang2SegTrack.git
cd Lang2SegTrack
conda create -n lang2segtrack python=3.10
conda activate lang2segtrack
pip install torch==2.4.1 torchvision==0.19.1 --extra-index-url https://download.pytorch.org/whl/cu124
cd models/sam2
pip install -e .
pip install -e ".[notebooks]"
cd ../gdino
pip install -e .
📦 Download SAM2.1 Checkpoints
Place them into sam2/checkpoints/:
Usage
python scripts/lang2segtrack.py
-
track: Track objects in video files or real-time video streams by using first-frame prompts, text-prompts, mouse click(Ctrl + left) or manual boxes drawing. If you need a higher frame rate and don't need the text prompt feature, set ituse_txt_prompt=False. You can also increase the tracking frame rate by setting theimage_sizeparameter in the config file(sam2/configs/samurai/) to which the model belongs, for example, to 512. -
predict_img: Predict objects in images by using text-prompts.python scripts/lang2segtrack_with_backward.py -
track: Supports backward tracking after the forward tracking is completed, and finally returns the tracking data 'mask' of each tracked object in the entire video stream.python scripts/lang2segtrack_keep_detection.py -
track: Support continuous detection, tracking, and segmentation of individual category objects in video streams through text prompts. (The 'trial version' only uses the IOU to manage new and old objects, and subsequent versions will use more efficient and accurate method. This script contains all the functions of the aforementioned script, but this script is demanding on device performance, and if there is no need for continuous tracking, it is recommended to uselang2segtrack.pyorlang2segtrack_with_backward.py.)
🙏 Acknowledgments
This project builds upon outstanding prior work:
Citation
If you find this project useful, please consider citing:
@misc{lang2segtrack,
title = {Language-to-Segment&Track},
howpublished = {\url{https://github.com/wngkj/Lang2SegTrack}},
author = {Kejie Wang},
year = {2025}
}
