SkillAgentSearch skills...

SegTrackDetect

SegTrackDetect - A framework for ROI-based Tiny Object Detection at full resolution.

Install / Use

/learn @deepdrivepl/SegTrackDetect
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SegTrackDetect

SegTrackDetect is a modular framework designed for accurate small object detection using a combination of segmentation and tracking techniques. It performs detection within selected Regions of Interest (ROIs), providing a highly efficient solution for scenarios where detecting tiny objects with precision is critical. The framework's modularity empowers users to easily customize key components, including the ROI Estimation Module, ROI Prediction Module, and Object Detector. It also features our Overlapping Box Suppression Algorithm that efficiently combines detected objects from multiple sub-windows, filtering them to overcome the limitations of window-based detection methods.

example

See the following sections for more details on the framework, its components, and customization options:

To get started with the framework right away, head to the Getting Started section.

Architecture

SemSegTrack is an object detection framework that selects ROIs for detailed detection through two main modules: the ROI Prediction Module and the ROI Estimation Module.

  • The ROI Prediction Module leverages object tracking to predict object locations based on previous detections and is specifically used in video mode. Users can switch between video or image mode, depending on their use case
  • The ROI Estimation Module uses binary semantic segmentation to identify promising regions within the input image

Both branches feed into the ROI Fusion Module, where their outputs are merged. The framework then determines a set of detection window coordinates. Detection is performed independently on each window, and the results are aggregated. To prevent redundancy, the Overlapping Box Suppression Algorithm filters overlapping detections. In video mode, detections are further utilized to update the tracker’s state.

architecture

ROI Fusion Module

The ROI Fusion Module merges output masks from both branches and utilizes the fused mask to determine detection window coordinates. Based on the dataset being used, the framework offers flexibility with two strategies for handling large ROIs: resizing or sliding window detection. If the --allow_resize flag is enabled, large ROIs (exceeding the detection window size) will be cropped and scaled to fit the detector’s input. Otherwise, a sliding-window approach will be applied within the larger ROI regions. Detection windows are generated by positioning them at the center of each ROI region. Prior to detection, they undergo a filtering process to eliminate unnecessary detector calls for redundant windows. Both the Prediction and Estimation branches are designed to be highly customizable, allowing users to fine-tune the framework for a wide range of scenarios. While the included datasets are examples, all decisions should be driven by the specific data at hand, ensuring the best performance in real-world applications.

ROI Prediction with Object Trackers

The framework currently integrates the SORT tracker from a forked repository, allowing for efficient ROI prediction in video mode. However, the framework is designed to be adaptable, enabling users to integrate any other object tracker, provided that the tracker's prediction and update functions are modular and separate. For guidance, users can refer to our implementation of the SORT tracker to see how it has been adapted to fit seamlessly within the framework's workflow.

ROI Estimation with Segmentation

The ROI Estimation Module processes input images to generate probability masks used for selecting Regions of Interest (ROIs). All models utilized in this module are in TorchScript format, ensuring seamless integration into the framework.

A comprehensive list of currently supported models, along with their names, can be found in the model ZOO. The behavior of the ROI Estimation Module can be easily customized for the existing models, and you can also add your own models. To do this, navigate to the estimator configs directory and create your own configuration dictionaries. Remember to register any new configurations in the ESTIMATOR MODELS to enable their usage by name in the main scripts.

For existing models, you can implement new postprocessing functions or modify postprocessing parameters (e.g., thresholding or dilation). Please ensure that the postprocess function returns the mask in a [H, W] format.

Object Detection

In SegTrackDetect, the object detector is executed multiple times for each sub-window to effectively capture the features of tiny objects. A comprehensive list of all available models can be found in the model ZOO. You can customize the behavior of each model (e.g., the NMS parameters) by modifying the configuration dictionaries located in the detectors config directory.

New models can be registered similarly to the ROI Estimation Models: create a new configuration dictionary and register it in the DETECTION_MODELS.

Detection Aggregation and Filtering

Finally, detections from all sub-windows are aggregated and filtered using the Overlapping Box Suppression (OBS) Algorithm. OBS leverages the sub-window coordinates to eliminate partial detections that arise from overlapping detection sub-windows. You can customize the IoU threshold for OBS using the --obs_iou_th argument in the main scripts. For more detailed information on OBS, please refer to the documentation.

Getting Started

Depencencies

To simplify the setup process, we provide a Dockerfile that manages all necessary dependencies for you. Follow these steps to get started:

  1. Install Docker: Begin by installing the Docker Engine
  2. Install NVIDIA Container Toolkit: If you plan to run detection on a GPU, make sure to install the NVIDIA Container Toolkit

Once you have Docker set up, you can download all the trained models listed in the Model ZOO and build the Docker image by running the following command:

./build_and_run.sh

We currently support four datasets, and we provide scripts to download and convert them into a compatible format. To download and convert all datasets at once, execute:

./scripts/download_and_convert.sh

If you prefer to download specific datasets, you can run the corresponding scripts located in the scripts directory.

⚠️ For the MTSD dataset, please visit the official dataset page to download the data manually. For details on the required directory structure, refer to this script. After downloading the dataset and the older annotation version, you will need to convert it to the framework format using:

python /SegTrackDetect/scripts/converters/MTSD.py

Examples

The SegTrackDetect framework enables robust tiny object detection both across consecutive video frames (video mode) and within independent detection windows. The dataset type is automatically inferred from the dataset directory structure. For more information, see datasets.

The main features of the SegTrackDetect framework can be explored by executing the inference.py script with varius configurations. To perform detection on video data using a supported dataset like SeaDronesSee, run the following command:

python inference.py \
--roi_model 'SDS_large' --det_model 'SDS' --tracker 'sort' \
--data_root '/SegTrackDetect/data/SeaDronesSee' --split 'val' \
--bbox_type 'sorted' --allow_resize --obs_iou_th 0.1 \
--out_dir 'results/SDS/val' --debug

To detect objects in independent windows, for instance, using the MTSD dataset, you can use the same script with slight modifications:

python inference.py \
--roi_model 'MTSD' --det_model 'MTSD' \
--data_root '/SegTrackDetect/data/MTSD' --split 'val'  \
--bbox_type 'sorted' --allow_resize --obs_iou_th 0.7 \
--out_dir 'results/MTSD/val' --debug

The following table outlines the command-line arguments that can be used when running the inference script. These arguments allow you to customize the behavior of the detection process by specifying models, datasets, and various configurations.

| Argument | Type | Description | |:-------------------:|-----------|---------------------------------------------------------------------------------------------------------------------------------------------| | --roi_model | str | Specifies the ROI model to use (e.g., SDS_large). All available ROI models are defined here | | --det_model | str | Specifies the detection model to use (e.g., SDS). All available detectors are defined here | | --tracker | str | Spe

View on GitHub
GitHub Stars11
CategoryDevelopment
Updated2mo ago
Forks4

Languages

Python

Security Score

90/100

Audited on Jan 28, 2026

No findings