[CVPR 2025] SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Official Pytorch implementation of SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

SegMAN

Main Results

Installation and data preparation

Step 1: Create a new environment

conda create -n segman python=3.10
conda activate segman

pip install torch==2.1.2 torchvision==0.16.2

Step 2: Install MMSegmentation v0.30.0 by following the installation guidelines and prepare segmentation datasets by following data preparation. The following installation commands works for me:

pip install -U openmim
mim install mmcv-full
cd segmentation
pip install -v -e .

To support torch>=2.1.0, you also need to import from packaging import version and replace Line 75 of /miniconda3/envs/segman/lib/python3.10/site-packages/mmcv/parallel/_functions.py with the following:

if version.parse(torch.__version__) >= version.parse('2.1.0'):
    streams = [_get_stream(torch.device("cuda", device)) for device in target_gpus]
else:
    streams = [_get_stream(device) for device in target_gpus]

Step 3: Install dependencies using the following commands.

To install Natten, you should modify the following with your PyTorch and CUDA versions accordingly.

pip install natten==0.17.3+torch210cu121 -f https://shi-labs.com/natten/wheels/

The Selective Scan 2D can be install with:

cd kernels/selective_scan && pip install .

Install other requirements:

pip install -r requirements.txt

Training

Download the ImageNet-1k pretrained weights here and put them in a folder pretrained/. Navigate to the segmentation directory:

cd segmentation

Scripts to reproduce our paper results are provided in ./scripts Example training script for SegMAN-B on ADE20K:

# Single-gpu
python tools/train.py local_configs/segman/base/segman_b_ade.py --work-dir outputs/EXP_NAME

# Multi-gpu
bash tools/dist_train.sh local_configs/segman/base/segman_b_ade.py <GPU_NUM> --work-dir outputs/EXP_NAME

Evaluation

Download trained weights for segmentation models at google drive. Navigate to the segmentation directory:

cd segmentation

Example for evaluating SegMAN-B on ADE20K:

# Single-gpu
python tools/test.py local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file

# Multi-gpu
bash tools/dist_test.sh local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file <GPU_NUM>

ADE20K

|Model|Backbone (ImageNet-1k Top1 Acc)|mIoU|Params|FLOPs|Config|Download| |-----------|--------------------------------|------|--------|--------|------------------------------------------------------------------------|--------------------------------------------------------------------------| |SegMAN-T|SegMAN Encoder-T (76.2)| 43.0 | 6.4M | 6.2G | config | Google Drive | | SegMAN-S | SegMAN Encoder-S (84.0) | 51.3 | 29.4M | 25.3G | config | Google Drive | | SegMAN-B | SegMAN Encoder-B (85.1) | 52.6 | 51.8M | 58.1G | config | Google Drive | | SegMAN-L | SegMAN Encoder-L (85.5) | 53.2 | 92.6M | 97.1G | config | Google Drive |

Cityscapes

| Model | Backbone (ImageNet-1k Top1 Acc) | mIoU | Params | FLOPs | Config | Download | |-----------|--------------------------------|------|--------|--------|------------------------------------------------------------------------|--------------------------------------------------------------------------| | SegMAN-T | SegMAN Encoder-T 76.2) | 80.3 | 6.4M | 52.5G | config | Google Drive | | SegMAN-S | SegMAN Encoder-S (84.0) | 83.2 | 29.4M | 218.4G | config | Google Drive | | SegMAN-B | SegMAN Encoder-B (85.1) | 83.8 | 51.8M | 479.0G | config | Google Drive | | SegMAN-L | SegMAN Encoder-L (85.5) | 84.2 | 92.6M | 769.0G | config | Google Drive |

COCO-Stuff

| Model | Backbone (ImageNet-1k Top1 Acc) | mIoU | Params | FLOPs | Config | Download | |-----------|--------------------------------|------|--------|--------|------------------------------------------------------------------------|--------------------------------------------------------------------------| | SegMAN-T | SegMAN Encoder-T (76.2) | 41.3 | 6.4M | 6.2G | config | Google Drive | | SegMAN-S | SegMAN Encoder-S (84.0) | 47.5 | 29.4M | 25.3G | config | Google Drive | | SegMAN-B | SegMAN Encoder-B (85.1) | 48.4 | 51.8M | 58.1G | config | Google Drive | | SegMAN-L | SegMAN Encoder-L (85.5) | 48.8 | 92.6M | 97.1G | config | Google Drive |

Encoder Pre-training

We provide scripts for pre-training the encoder from scratch.

Step 1: Download ImageNet-1k and using this script to extract it.

Step 2: Start training with

bash scripts/train_segman-s.sh

Visualization

You can visualize segmentation results using pre-trained checkpoints with the following (under segmentation directory):

python image_demo.py \
img_path \
config_file \
checkpoint_file \
--palette 'ade20k' \
--out-file segman_demo.png \
--device 'cuda:0'

Replace img_path, config_file, and checkpoint_file with the image and model you want to visualize. Select a palette from {ade20k, coco_stuff164k, cityscapes} that correspond to the dataset you want to visualize.

Acknowledgements

Our implementation is based on MMSegmentaion, Natten, VMamba, and SegFormer. We gratefully thank the authors.

Citation

@inproceedings{SegMAN,
    title={SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation},
    author={Yunxiang Fu and Meng Lou and Yizhou Yu},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025}
}