DMT

Code release for the CVPR 2023 paper "Discriminative Co-Saliency and BG Mining Transformer for Co-Salient Object Detection" by Long Li, Junwei Han, Ni Zhang, Nian Liu*, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan

Generate Convert Improve

Install / Use

/learn @dragonlee258079/DMT

About this skill

Quality Score

0/100

README

DMT

Code release for the CVPR 2023 paper "Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection".

avatar

Abstract

Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignore explicit exploration of background regions. In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. Specifically, we first propose a region-to-region correlation module for introducing inter-image relations to pixel-wise segmentation features while maintaining computational efficiency. Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules. We also design a token-guided feature refinement module to enhance the discriminability of the segmentation features under the guidance of the learned tokens. We perform iterative mutual promotion for the segmentation feature extraction and token construction. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.

Result

The prediction results of our dataset can be download from prediction (jjht).

alt_text

Environment Configuration

Linux with Python ≥ 3.6
PyTorch ≥ 1.7 and torchvision that matches the PyTorch installation. Install them together at pytorch.org to make sure of this. Note, please check PyTorch version matches that is required by Detectron2.
Detectron2: follow Detectron2 installation instructions.
OpenCV is optional but needed by demo and visualization
pip install -r requirements.txt

Data Preparation

Download the dataset from Baidu Driver (cxx2) and unzip them to './dataset'. Then the structure of the './dataset' folder will show as following:

-- dataset
   |-- train_data
   |   |-- | CoCo9k
   |   |-- | DUTS_class
   |   |-- | DUTS_class_syn
   |   |-- |-- | img_png_seamless_cloning_add_naive
   |   |-- |-- | img_png_seamless_cloning_add_naive_reverse_2
   |-- test_data
   |   |-- | CoCA
   |   |-- | CoSal2015
   |   |-- | CoSOD3k

Training model

Download the pretrained VGG model from Baidu Driver(sqd5) and put it into ./checkpoint folder.
Run python train.py.
The trained models with satisfactory performance will be saved in ./checkpoint/CVPR2023_Final_Code

Testing model

Download our trained model from Baidu Driver(c87t) and put it into ./checkpoint/CVPR2023_Final_Code folder.
Run python test.py.
The prediction images will be saved in ./prediction.
Run python ./evaluation/eval_from_imgs.py to evaluate the predicted results on three datasets and the evaluation scores will be written in ./evaluation/result.

Related Skills

node-connect

347.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。