Sigma
[WACV 2025] Python implementation of Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
Install / Use
/learn @zifuwan/SigmaREADME
Zifu Wan, Yuhao Wang, Silong Yong, Pingping Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie</sup>
Robotics Institute, Carnegie Mellon University
</div>👀Introduction
This repository contains the code for our paper Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation. [Paper]

Sigma, as a lightweight and efficient method, reaches a balance between accuracy and speed. (Results below are calculated on MFNet dataset)

💡Environment
We test our codebase with PyTorch 1.13.1 + CUDA 11.7 as well as PyTorch 2.2.1 + CUDA 12.1. Please install corresponding PyTorch and CUDA versions according to your computational resources. We showcase the environment creating process with PyTorch 1.13.1 as follows.
-
Create environment.
conda create -n sigma python=3.9 conda activate sigma -
Install all dependencies. Install pytorch, cuda and cudnn, then install other dependencies via:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117pip install -r requirements.txt -
Install Mamba
cd models/encoders/selective_scan && pip install . && cd ../../..
⏳Setup
Datasets
-
We use four datasets, including both RGB-Thermal and RGB-Depth datasets:
Please refer to the original dataset websites for more details. You can directly download the processed RGB-Depth datasets from DFormer, though you may need to make small modifications to the txt files.
-
<u>We also provide the processed datasets (including RGB-Thermal and RGB-Depth) we use here: Google Drive Link.</u>
-
If you are using your own datasets, please orgnize the dataset folder in the following structure:
<datasets> |-- <DatasetName1> |-- <RGBFolder> |-- <name1>.<ImageFormat> |-- <name2>.<ImageFormat> ... |-- <ModalXFolder> |-- <name1>.<ModalXFormat> |-- <name2>.<ModalXFormat> ... |-- <LabelFolder> |-- <name1>.<LabelFormat> |-- <name2>.<LabelFormat> ... |-- train.txt |-- test.txt |-- <DatasetName2> |-- ...train.txt/test.txtcontains the names of items in training/testing set, e.g.:<name1> <name2> ...
📦Usage
Training
-
Please download the pretrained VMamba weights:
<u> Please put them under
pretrained/vmamba/. </u> -
Config setting.
Edit config file in the
configsfolder.
Change C.backbone tosigma_tiny/sigma_small/sigma_baseto use the three versions of Sigma. -
Run multi-GPU distributed training:
NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node=4 --master_port 29502 train.py -p 29502 -d 0,1,2,3 -n "dataset_name"Here,
dataset_name=mfnet/pst/nyu/sun, referring to the four datasets. -
You can also use single-GPU training:
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun -m --nproc_per_node=1 train.py -p 29501 -d 0 -n "dataset_name" -
Results will be saved in
log_finalfolder.
Evaluation
-
Run the evaluation by:
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python eval.py -d="0" -n "dataset_name" -e="epoch_number" -p="visualize_savedir"Here,
dataset_name=mfnet/pst/nyu/sun, referring to the four datasets.
epoch_numberrefers to a number standing for the epoch number you want to evaluate with. You can also use a.pthcheckpoint path directly forepoch_numberto test for a specific weight. -
If you want to use multi GPUs please specify multiple Device IDs:
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python eval.py -d="0,1,2,3,4,5,6,7" -n "dataset_name" -e="epoch_number" -p="visualize_savedir" -
Results will be saved in
log_finalfolder.
📈Results
We provide our trained weights on the four datasets:
MFNet (9 categories)
| Architecture | Backbone | mIOU | Weight | |:---:|:---:|:---:|:---:| | Sigma | VMamba-T | 60.2% | Sigma-T-MFNet | | Sigma | VMamba-S | 61.1% | Sigma-S-MFNet | | Sigma | VMamba-B | 61.3% | Sigma-B-MFNet |
PST900 (5 categories)
| Architecture | Backbone | mIOU | Weight | |:---:|:---:|:---:|:---:| | Sigma | VMamba-T | 88.6% | Sigma-T-PST | | Sigma | VMamba-S | 87.8% | Sigma-S-PST |
NYU Depth V2 (40 categories)
| Architecture | Backbone | mIOU | Weight | |:---:|:---:|:---:|:---:| | Sigma | VMamba-T | 53.9% | Sigma-T-NYU | | Sigma | VMamba-S | 57.0% | Sigma-S-NYU |
SUN RGB-D (37 categories)
| Architecture | Backbone | mIOU | Weight | |:---:|:---:|:---:|:---:| | Sigma | VMamba-T | 50.0% | Sigma-T-SUN | | Sigma | VMamba-S | 52.4% | Sigma-S-SUN |
🙏Acknowledgements
Our dataloader codes are based on CMX. Our Mamba codes are adapted from Mamba and VMamba. We thank the authors for releasing their code! We also appreciate DFormer for providing their processed RGB-Depth datasets.
📧Contact
If you have any questions, please contact at zifuw@andrew.cmu.edu.
📌 BibTeX & Citation
If you find this code useful, please consider citing our work:
@article{wan2024sigma,
title={Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation},
author={Wan, Zifu and Wang, Yuhao and Yong, Silong and Zhang, Pingping and Stepputtis, Simon and Sycara, Katia and Xie, Yaqi},
journal={arXiv preprint arXiv:2404.04256},
year={2024}
}
Related Skills
node-connect
354.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
