SkillAgentSearch skills...

STMask

Code release for "STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation"(CVPR2021)

Install / Use

/learn @MinghanLi/STMask
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

STMask

The code is implmented for our paper in CVPR2021:

image

News

  • [27/06/2021] !Important issue: For previous results of YTVIS2021 and OVIS datasets, we use the bounding boxes with normalization in the function bbox_feat_extractor() of track_to_segmetn_head.py by mistake. However, the bounding boxes in bbox_feat_extractor() function should not be normalized. We update the results and trained models for YTVIS2021 and OVIS datasets. Apologize for our negligence.
  • [12/06/2021] Update the solution for the error in deform_conv_cuda.cu
  • [22/04/2021] Add experimental results on YTVIS2021 and OVIS datasets
  • [14/04/2021] Release code on Github and paper on arxiv

Installation

  • Clone this repository and enter it:

    git clone https://github.com/MinghanLi/STMask.git
    cd STMask
    
  • Set up the environment using one of the following methods:

    • Using Anaconda
      • Run conda env create -f environment.yml
      • conda activate STMask-env
    • Manually with pip
      • Set up a Python3 environment.
      • Install Pytorch 1.0.1 (or higher) and TorchVision.
      • Install some other packages:
        # Cython needs to be installed before pycocotools
        pip install cython
        pip install opencv-python pillow pycocotools matplotlib 
        
  • Install mmcv and mmdet

    • According to your Cuda and pytorch version to install mmcv or mmcv-full from here. Here my cuda and torch version are 10.1 and 1.5.0 respectively.
      pip install mmcv-full==1.1.2 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.5.0/index.html
      
    • install cocoapi and a customized COCO API for YouTubeVIS dataset from here
      pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI"
      git clone https://github.com/youtubevos/cocoapi
      cd cocoapi/PythonAPI
      # To compile and install locally 
      python setup.py build_ext --inplace
      # To install library to Python site-packages 
      python setup.py build_ext install
      
  • Install spatial-correlation-sampler

    pip install spatial-correlation-sampler
    
  • Complie DCNv2 code (see Installation)

    • Download code for deformable convolutional layers from here
      git clone https://github.com/CharlesShang/DCNv2.git
      cd DCNv2
      python setup.py build develop
      
  • Modify mmcv/ops/deform_conv.py to handle deformable convolution with different height and width (like 3 * 5) in FCB(ali) or FCB(ada)

    • Open the file deform_conv.py
      vim /your_conda_env_path/mmcv/ops/deform_conv.py
      
    • Replace padW=ctx.padding[1], padH=ctx.padding[0] with padW=ctx.padding[0], padH=ctx.padding[1], taking Line 81-89 as an example:
      ext_module.deform_conv_forward(
              input,
              weight,
              offset,
              output,
              ctx.bufs_[0],
              ctx.bufs_[1],
              kW=weight.size(3),
              kH=weight.size(2),
              dW=ctx.stride[1],
              dH=ctx.stride[0],
              padW=ctx.padding[0],
              padH=ctx.padding[1],
              dilationW=ctx.dilation[1],
              dilationH=ctx.dilation[0],
              group=ctx.groups,
              deformable_group=ctx.deform_groups,
              im2col_step=cur_im2col_step)
      

Dataset

  • If you'd like to train STMask, please download the datasets from the official web: YTVIS2019, YTVIS2021 and OVIS.

Evaluation

The input size on all VIS benchmarks is 360*640 here.

Quantitative Results on YTVIS2019 ((trained with 12 epoches))

Here are our STMask models (released on April, 2021) along with their FPS on a 2080Ti and mAP on valid set, where mAP and mAP* are obtained under cross class fast nms and fast nms respectively. Note that FCB(ali) and FCB(ada) are only executed on the classification branch.

| Backbone | FCA | FCB | TF | FPS | mAP | mAP* | Weights |
|:-------------:|:----:|:----: |----|------|------|------|-----------------------------------------------------------------------------------------------------------| | R50-DCN-FPN | FCA | - | TF | 29.3 | 32.6 | 33.4 | STMask_plus_resnet50.pth | | R50-DCN-FPN | FCA | FCB(ali) | TF | 27.8 | - | 32.1 | STMask_plus_resnet50_ali.pth | | R50-DCN-FPN | FCA | FCB(ada) | TF | 28.6 | 32.8 | 33.0 | STMask_plus_resnet50_ada.pth | | R101-DCN-FPN | FCA | - | TF | 24.5 | 36.0 | 36.3 | STMask_plus_base.pth |
| R101-DCN-FPN | FCA | FCB(ali) | TF | 22.1 | 36.3 | 37.1 | STMask_plus_base_ali.pth |
| R101-DCN-FPN | FCA | FCB(ada) | TF | 23.4 | 36.8 | 37.9 | STMask_plus_base_ada.pth |


Quantitative Results on YTVIS2021 (trained with 12 epoches)

| Backbone | FCA | FCB | TF | mAP* | Weights | Results |
|:-------------:|:----:|:----: |----|------|-----------------------------------------------------------------------------------------------------------|------| | R50-DCN-FPN | FCA | - | TF | 30.6 | STMask_plus_resnet50_YTVIS2021.pth | - | | R50-DCN-FPN | FCA | FCB(ada) | TF | 31.1 | STMask_plus_resnet50_ada_YTVIS2021.pth | stdout.txt | | R101-DCN-FPN | FCA | - | TF | 33.7 | STMask_plus_base_YTVIS2021.pth | - | | R101-DCN-FPN | FCA | FCB(ada) | TF | 34.6 | STMask_plus_base_ada_YTVIS2021.pth | stdout.txt

Quantitative Results on OVIS (trained with 20 epoches)

| Backbone | FCA | FCB | TF | mAP* | Weights | Results|
|:-------------:|:----:|:----: |----|------|-----------------------------------------------------------------------------------------------------------|------| | R50-DCN-FPN | FCA | - | TF | 15.4 | STMask_plus_resnet50_OVIS.pth | - | | R50-DCN-FPN | FCA | FCB(ada) | TF | 15.4 | STMask_plus_resnet50_ada_OVIS.pth | stdout.txt | | R101-DCN-FPN | FCA | - | TF | 17.3 | STMask_plus_base_OVIS.pth | stdout.txt | | R101-DCN-FPN | FCA | FCB(ada) | TF | 15.8 | STMask_plus_base_ada_OVIS.pth | - |

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., STMask_plus_base for STMask_plus_base.pth). Here all STMask models are trained based on yolact_plus_base_54_80000.pth or yolact_plus_resnet_54_80000.pth from Yolact++ here.

Quantitative Results on COCO

We also provide quantitative results of Yolcat++ with our proposed feature calibration for anchors and boxes on COCO (w/o temporal fusion module). Here are the results on COCO valid set.

| Image Size | Backbone | FCA | FCB | B_AP | M_AP | Weights |
|:----------: |:-------------:|:----:|:----: |------|------|---------------------------------------------------------------------------------------------------------------| | [550,550] | R50-DCN-FPN | FCA | - | 34.5 | 32.9 |yolact_plus_resnet50_54.pth | | [550,550] | R50-DCN-FPN | FCA | FCB(ali) | 34.6 | 33.3 |yolact_plus_resnet50_ali_54.pth | | [550,550] | R50-DCN-FPN | FCA | FC

View on GitHub
GitHub Stars37
CategoryContent
Updated2mo ago
Forks8

Languages

Python

Security Score

80/100

Audited on Jan 19, 2026

No findings