A Robust Feature Downsampling Module for Remote Sensing Visual Tasks [TGRS 2023]

This is the official Pytorch/Pytorch implementation of the paper:

A Robust Feature Downsampling Module for Remote Sensing Visual Tasks
Wei Lu; Si-Bao Chen; Jin Tang; Chris H. Q. Ding; Bin Luo
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023

We propose a new and universal downsampling module named robust feature downsampling (RFD).

<details> <summary> Abstract </summary> Remote-sensing (RS) images present unique challenges for computer vision (CV) due to lower resolution, smaller objects, and fewer features. Mainstream backbone networks show promising results for traditional visual tasks. However, they use convolution to reduce feature map dimensionality, which can result in information loss for small objects in RS images and decreased performance. To address this problem, we propose a new and universal downsampling module named robust feature downsampling (RFD). RFD fuses multiple feature maps extracted by different downsampling techniques, creating a more robust feature map with a complementary set of features Leveraging this, we overcome the limitations of conventional convolutional downsampling, resulting in a more accurate and robust analysis of RS images. We develop two versions of the RFD module, shallow RFD (SRFD) and deep RFD (DRFD), tailored to adapt to different stages of feature capture and improve feature robustness. We replace the downsampling layers (DSL) of existing mainstream backbones with the RFD module and conduct comparative experiments on several public RS image datasets. The results show significant improvements compared to baseline approaches in RS image classification, object detection, and semantic segmentation. Specifically, our RFD module achieved an average performance gain of 1.5% on the NWPU-RESISC45 classification dataset without utilizing any additional pretraining data, resulting in state-of-the-art performance on this dataset. Moreover, in detection and segmentation tasks on dataset for object detection in aerial images (DOTA) and instance segmentation in aerial images dataset (iSAID), our RFD module outperforms the baseline approaches by 2%–7% when utilizing pretraining data from NWPU-RESISC45. These results highlight the value of the RFD module in enhancing the performance of RS visual tasks. </details>

Application

If you want to replace the downsampling module in your network with the RFD, you can do the following：

replace:
self.conv_down = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=2, padding=1)
to 
self.SRFD = RFD.SRFD(in_channels, out_channels) # original size to 4x downsampling layer
or
self.DRFD = RFD.DRFD(in_channels, out_channels) # Deep feature downsampling

Image Classification

1. Dependency Setup

Create a new conda virtual environment

conda create -n RFD python=3.7 -y
conda activate RFD
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch

Clone this repo and install the required packages:

git clone https://github.com/lwCVer/RFD
cd RFD/
pip install -r requirements.txt

2. Dataset Preparation

You can download our already sliced NWPU-RESISC45 dataset, or download the NWPU-RESISC45 classification dataset from the official document and structure the data as follows:

/path/to/NWPU-RESISC45/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

3. Training

Swin V2 Tiny training on RESISC45 (dataset path need to be changed in train.py):

python train.py

To train other models, train.py need to be changed.

4. Pre-trained Models on NWPU-RESISC45 (initial / +RFD)

| name (initial / +RFD) | |:---------------- | GFNet-H Tiny | GFNet-H Small | GFNet-H Base | AS-MLP Tiny | AS-MLP Small | AS-MLP Base | Swin Tiny | Swin Small | Swin Base | CSWin Tiny | CSWin Small | CSWin Base | Swin V2 Tiny | Swin V2 Small | Swin V2 Base | Mixformer Tiny | Mixformer Small | Mixformer Base type | #params (M) | FLOPs (G) | Throughput (fps) | Top-1 acc | model | -----:|:-----------:|:-------------:|:-------------:|:----------------:|:-----------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | FFT | 14.60 / 15.68 | 2.05 / 2.43 | 2693 / 2128 | 92.27 / 94.76 | initial / +RFD | | FFT | 31.43 / 33.05 | 4.59 / 5.34 | 2405 / 2466 | 93.40 / 95.11 | initial / +RFD | | FFT | 53.01 / 55.43 | 8.53 / 9.28 | 2098 / 1886 | 94.17 / 95.46 | initial / +RFD | | MLP | 27.55 / 29.96 | 4.39 / 5.14 | 1505 / 1571 | 95.37 / 96.05 | initial / +RFD | | MLP | 48.86 / 51.27 | 8.57 / 9.32 | 1073 / 1019 | 95.27 / 96.00 | initial / +RFD | | MLP | 86.77 / 91.05 | 15.2 / 16.44 | 861 / 830 | 95.63 / 95.94 | initial / +RFD | | Transformer | 27.55 / 29.97 | 4.36 / 5.11 | 2469 / 2313 | 93.52 / 96.10 | initial / +RFD | | Transformer | 48.87 / 51.28 | 8.52 / 9.27 | 1995 / 1762 | 93.37 / 96.16 | initial / +RFD | | Transformer | 86.79 / 91.06 | 15.14 / 16.37 | 1975 / 1734 | 93.19 / 96.24 | initial / +RFD | | Transformer | 21.83 / 22.05 | 4.08 / 4.35 | 1303 / 1127 | 93.05 / 96.11 | initial / +RFD | | Transformer | 34.15 / 34.37 | 6.40 / 6.67 | 682 / 579 | 93.56 / 96.11 | initial / +RFD | | Transformer | 76.65 / 77.12 | 14.36 / 14.87 | 481 / 406 | 94.49 / 96.29 | initial / +RFD | | Transformer | 27.61 / 30.03 | 3.33 / 4.08 | 2009 / 1987 | 94.65 / 96.46 | initial / +RFD | | Transformer | 48.99 / 51.41 | 6.47 / 7.22 | 1273 / 827 | 95.22 / 96.84 | initial / +RFD | | Transformer | 86.94 / 91.22 | 11.48 / 12.71 | 1104 / 682 | 95.63 / 96.61 | initial / +RFD | | Hybrid | 5.10 / 5.25 | 0.39 / 0.44 | 1287 / 1192 | 94.87 / 95.30 | initial / +RFD | | Hybrid | 9.89 / 10.17 | 0.88 / 0.95 | 1018 / 975 | 95.41 / 96.03 | initial / +RFD | | Hybrid | 34.80 / 35.85 | 3.44 / 3.56 | 830 / 722 | 95.76 / 96.37 | initial / [+RFD](https://github.com/lwCVer/RFD/releases/downlo

RFD

Install / Use

README