RMSIN

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation

Generate Convert Improve

Install / Use

/learn @Lsan2401/RMSIN

About this skill

Quality Score

0/100

README

RMSIN

This repository is the offical implementation for "Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation." Pipeline Image

Setting Up

Preliminaries

The code has been verified to work with PyTorch v1.7.1 and Python 3.7.

Clone this repository.
Change directory to root of this repository.

Package Dependencies

Create a new Conda environment with Python 3.7 then activate it:

conda create -n RMSIN python==3.7
conda activate RMSIN

Install PyTorch v1.7.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch

Install the packages in requirements.txt via pip:

pip install -r requirements.txt

The Initialization Weights for Training

Create the ./pretrained_weights directory where we will be storing the weights.

mkdir ./pretrained_weights

Download pre-trained classification weights of the Swin Transformer, and put the pth file in ./pretrained_weights. These weights are needed for training to initialize the model.

Datasets

We perform all experiments on our proposed dataset RRSIS-D. RRSIS-D is a new Referring Remote Sensing Image Segmentation benchmark which containes 17,402 image-caption-mask triplets. It can be downloaded from Google Drive or Baidu Netdisk (access code: sjoe).

Usage

Download our dataset.
Copy all the downloaded files to ./refer/data/. The dataset folder should be like this:

$DATA_PATH
├── rrsisd
│   ├── refs(unc).p
│   ├── instances.json
└── images
    └── rrsisd
        ├── JPEGImages
        ├── ann_split

Training

We use DistributedDataParallel from PyTorch for training. To run on 4 GPUs (with IDs 0, 1, 2, and 3) on a single node:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 train.py --dataset rrsisd --model_id RMSIN --epochs 40 --img_size 480 2>&1 | tee ./output

Testing

python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split val --workers 4 --window12 --img_size 480

Acknowledgements

Code in this repository is built on LAVT. We'd like to thank the authors for open sourcing their project.

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。