GeoSeg
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS. Also, including other vision transformers and CNNs for satellite, aerial image and UAV image segmentation.
Install / Use
/learn @WangLibo1995/GeoSegREADME
Version 2.0 (stable)
News
- The code of PyramidMamba is released.
- I have updated this repo to pytorch 2.0 and pytorch-lightning 2.0, support multi-gpu training, etc.
- Pretrained Weights of backbones can be access from Google Drive
- UNetFormer (accepted by ISPRS, PDF) and UAVid dataset are supported.
- ISPRS Vaihingen and Potsdam datasets are supported. Since private sharing is not allowed, you need to download the datasets from the official website and split them by Folder Structure.
- More networks are updated and the link of pretrained weights is provided.
- config/loveda/dcswin.py provides a detailed explain about config setting.
- Inference on huge RS images are supported (inference_huge_image.py).
Introduction
GeoSeg is an open-source semantic segmentation toolbox based on PyTorch, pytorch lightning and timm, which mainly focuses on developing advanced Vision Transformers for remote sensing image segmentation.
Major Features
-
Unified Benchmark
we provide a unified training script for various segmentation methods.
-
Simple and Effective
Thanks to pytorch lightning and timm , the code is easy for further development.
-
Supported Remote Sensing Datasets
- ISPRS Vaihingen and Potsdam
- UAVid
- LoveDA
- OpenEarthMap
- More datasets will be supported in the future.
-
Multi-scale Training and Testing
-
Inference on Huge Remote Sensing Images
Supported Networks
-
Mamba
-
Vision Transformer
-
CNN
Folder Structure
Prepare the following folders to organize this repo:
airs
├── GeoSeg (code)
├── pretrain_weights (pretrained weights of backbones, such as vit, swin, etc)
├── model_weights (save the model weights trained on ISPRS vaihingen, LoveDA, etc)
├── fig_results (save the masks predicted by models)
├── lightning_logs (CSV format training logs)
├── data
│ ├── LoveDA
│ │ ├── Train
│ │ │ ├── Urban
│ │ │ │ ├── images_png (original images)
│ │ │ │ ├── masks_png (original masks)
│ │ │ │ ├── masks_png_convert (converted masks used for training)
│ │ │ │ ├── masks_png_convert_rgb (original rgb format masks)
│ │ │ ├── Rural
│ │ │ │ ├── images_png
│ │ │ │ ├── masks_png
│ │ │ │ ├── masks_png_convert
│ │ │ │ ├── masks_png_convert_rgb
│ │ ├── Val (the same with Train)
│ │ ├── Test
│ │ ├── train_val (Merge Train and Val)
│ ├── uavid
│ │ ├── uavid_train (original)
│ │ ├── uavid_val (original)
│ │ ├── uavid_test (original)
│ │ ├── uavid_train_val (Merge uavid_train and uavid_val)
│ │ ├── train (processed)
│ │ ├── val (processed)
│ │ ├── train_val (processed)
│ ├── vaihingen
│ │ ├── train_images (original)
│ │ ├── train_masks (original)
│ │ ├── test_images (original)
│ │ ├── test_masks (original)
│ │ ├── test_masks_eroded (original)
│ │ ├── train (processed)
│ │ ├── test (processed)
│ ├── potsdam (the same with vaihingen)
Install
Open the folder airs using Linux Terminal and create python environment:
conda create -n airs python=3.8
conda activate airs
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r GeoSeg/requirements.txt
Install Mamba
pip install causal-conv1d>=1.4.0
pip install mamba-ssm
Pretrained Weights of Backbones
Baidu Disk : 1234
Data Preprocessing
Download the datasets from the official website and split them yourself.
Vaihingen
Generate the training set.
python GeoSeg/tools/vaihingen_patch_split.py \
--img-dir "data/vaihingen/train_images" \
--mask-dir "data/vaihingen/train_masks" \
--output-img-dir "data/vaihingen/train/images_1024" \
--output-mask-dir "data/vaihingen/train/masks_1024" \
--mode "train" --split-size 1024 --stride 512
Generate the testing set.
python GeoSeg/tools/vaihingen_patch_split.py \
--img-dir "data/vaihingen/test_images" \
--mask-dir "data/vaihingen/test_masks_eroded" \
--output-img-dir "data/vaihingen/test/images_1024" \
--output-mask-dir "data/vaihingen/test/masks_1024" \
--mode "val" --split-size 1024 --stride 1024 \
--eroded
Generate the masks_1024_rgb (RGB format ground truth labels) for visualization.
python GeoSeg/tools/vaihingen_patch_split.py \
--img-dir "data/vaihingen/test_images" \
--mask-dir "data/vaihingen/test_masks" \
--output-img-dir "data/vaihingen/test/images_1024" \
--output-mask-dir "data/vaihingen/test/masks_1024_rgb" \
--mode "val" --split-size 1024 --stride 1024 \
--gt
As for the validation set, you can select some images from the training set to build it.
Potsdam
python GeoSeg/tools/potsdam_patch_split.py \
--img-dir "data/potsdam/train_images" \
--mask-dir "data/potsdam/train_masks" \
--output-img-dir "data/potsdam/train/images_1024" \
--output-mask-dir "data/potsdam/train/masks_1024" \
--mode "train" --split-size 1024 --stride 1024 --rgb-image
python GeoSeg/tools/potsdam_patch_split.py \
--img-dir "data/potsdam/test_images" \
--mask-dir "data/potsdam/test_masks_eroded" \
--output-img-dir "data/potsdam/test/images_1024" \
--output-mask-dir "data/potsdam/test/masks_1024" \
--mode "val" --split-size 1024 --stride 1024 \
--eroded --rgb-image
python GeoSeg/tools/potsdam_patch_split.py \
--img-dir "data/potsdam/test_images" \
--mask-dir "data/potsdam/test_masks" \
--output-img-dir "data/potsdam/test/images_1024" \
--output-mask-dir "data/potsdam/test/masks_1024_rgb" \
--mode "val" --split-size 1024 --stride 1024 \
--gt --rgb-image
UAVid
python GeoSeg/tools/uavid_patch_split.py \
--input-dir "data/uavid/uavid_train_val" \
--output-img-dir "data/uavid/train_val/images" \
--output-mask-dir "data/uavid/train_val/masks" \
--mode 'train' --split-size-h 1024 --split-size-w 1024 \
--stride-h 1024 --stride-w 1024
python GeoSeg/tools/uavid_patch_split.py \
--input-dir "data/uavid/uavid_train" \
--output-img-dir "data/uavid/train/images" \
--output-mask-dir "data/uavid/train/masks" \
--mode 'train' --split-size-h 1024 --split-size-w 1024 \
--stride-h 1024 --stride-w 1024
python GeoSeg/tools/uavid_patch_split.py \
--input-dir "data/uavid/uavid_val" \
--output-img-dir "data/uavid/val/images" \
--output-mask-dir "data/uavid/val/masks" \
--mode 'val' --split-size-h 1024 --split-size-w 1024 \
--stride-h 1024 --stride-w 1024
LoveDA
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Train/Rural/masks_png --output-mask-dir data/LoveDA/Train/Rural/masks_png_convert
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Train/Urban/masks_png --output-mask-dir data/LoveDA/Train/Urban/masks_png_convert
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Val/Rural/masks_png --output-mask-dir data/LoveDA/Val/Rural/masks_png_convert
python GeoSeg/tools/loveda_mask_convert.py --mask-dir data/LoveDA/Val/Urban/masks_png --output-mask-dir data/LoveDA/Val/Urban/masks_png_convert
Training
"-c" means the path of the config, use different config to train different models.
python GeoSeg/train_supervision.py -c GeoSeg/config/uav
