RMSIN
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Install / Use
/learn @Lsan2401/RMSINREADME
RMSIN
This repository is the offical implementation for "Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation."

Setting Up
Preliminaries
The code has been verified to work with PyTorch v1.7.1 and Python 3.7.
- Clone this repository.
- Change directory to root of this repository.
Package Dependencies
- Create a new Conda environment with Python 3.7 then activate it:
conda create -n RMSIN python==3.7
conda activate RMSIN
- Install PyTorch v1.7.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch
- Install the packages in
requirements.txtviapip:
pip install -r requirements.txt
The Initialization Weights for Training
- Create the
./pretrained_weightsdirectory where we will be storing the weights.
mkdir ./pretrained_weights
- Download pre-trained classification weights of
the Swin Transformer,
and put the
pthfile in./pretrained_weights. These weights are needed for training to initialize the model.
Datasets
We perform all experiments on our proposed dataset RRSIS-D. RRSIS-D is a new Referring Remote Sensing Image Segmentation benchmark which containes 17,402 image-caption-mask triplets. It can be downloaded from Google Drive or Baidu Netdisk (access code: sjoe).
Usage
- Download our dataset.
- Copy all the downloaded files to
./refer/data/. The dataset folder should be like this:
$DATA_PATH
├── rrsisd
│ ├── refs(unc).p
│ ├── instances.json
└── images
└── rrsisd
├── JPEGImages
├── ann_split
Training
We use DistributedDataParallel from PyTorch for training. To run on 4 GPUs (with IDs 0, 1, 2, and 3) on a single node:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 train.py --dataset rrsisd --model_id RMSIN --epochs 40 --img_size 480 2>&1 | tee ./output
Testing
python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split val --workers 4 --window12 --img_size 480
Acknowledgements
Code in this repository is built on LAVT. We'd like to thank the authors for open sourcing their project.
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
