SkillAgentSearch skills...

MFuser

[CVPR 2025 Highlight] Official code for paper "Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation"

Install / Use

/learn @devinxzhang/MFuser
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

[CVPR 2025 Highlight] Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

PWC

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

Xin Zhang, Robby T. Tan
National University of Singapore
CVPR 2025

[Project Page] [Paper]

Environment

Requirements

  • The requirements can be installed with:

    conda create -n mfuser python=3.9 numpy=1.26.4
    conda activate mfuser
    conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia
    pip install -r requirements.txt
    pip install xformers==0.0.20
    pip install mmcv-full==1.5.1 
    pip install mamba_ssm==2.2.2
    pip install causal_conv1d==1.4.0
    

Pre-trained VFM & VLM Models

  • Please download the pre-trained VFM and VLM models and save them in ./pretrained folder.

    | Model | Type | Link | |-----|-----|:-----:| | DINOv2 | dinov2_vitl14_pretrain.pth |download link| | CLIP | ViT-L-14-336px.pt |download link| | EVA02-CLIP | EVA02_CLIP_L_336_psz14_s6B.pt |download link| | SIGLIP | siglip_vitl16_384.pth |download link|

Checkpoints

  • You can download MFuser model checkpoints and save them in ./work_dirs_d folder. By default, all experiments below use DINOv2-L as the VFM.

    | Model | Pretrained | Trained on | Config | Link | |-----|-----|-----|-----|:-----:| | mfuser-clip-vit-l-city | CLIP | Cityscapes | config |download link| | mfuser-clip-vit-l-gta | CLIP | GTA5 | config |download link| | mfuser-eva02-clip-vit-l-city | EVA02-CLIP | Cityscapes | config |download link|
    | mfuser-eva02-clip-vit-l-gta | EVA02-CLIP | GTA5 | config |download link| | mfuser-siglip-vit-l-city | SIGLIP | Cityscapes | config |download link| | mfuser-siglip-vit-l-gta | SIGLIP | GTA5 | config |download link|

Datasets

  • To set up datasets, please follow the official TLDR repo.

  • After downloading the datasets, edit the data folder root in the dataset config files following your environment.

    src_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
    tgt_dataset_dict = dict(..., data_root='[YOUR_DATA_FOLDER_ROOT]', ...)
    
  • The final folder structure should look like this:

MFuser
├── ...
├── pretrained
│   ├── dinov2_vitl14_pretrain.pth
│   ├── EVA02_CLIP_L_336_psz14_s6B.pt
│   ├── siglip_vitl16_384.pth
│   ├── ViT-L-14-336px.pt
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── bdd100k
│   │   ├── images
│   │   |   ├── 10k
│   │   │   |    ├── train
│   │   │   |    ├── val
│   │   ├── labels
│   │   |   ├── sem_seg
│   │   |   |    ├── masks
│   │   │   |    |    ├── train
│   │   │   |    |    ├── val
│   ├── mapillary
│   │   ├── training
│   │   ├── cityscapes_trainIdLabel
│   │   ├── half
│   │   │   ├── val_img
│   │   │   ├── val_label
│   ├── gta
│   │   ├── images
│   │   ├── labels
├── ...

Training

python train.py configs/[TRAIN_CONFIG]

Evaluation

Run the evaluation:

python test.py configs/[TEST_CONFIG] work_dirs_d/[MODEL] --eval mIoU

Citation

If you find our code helpful, please cite our paper:

@article{zhang2025mamba,
  title     = {Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation},
  author    = {Zhang, Xin and Robby T., Tan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2025},
}

Acknowledgements

This project is based on the following open-source projects. We thank the authors for sharing their codes.

Related Skills

View on GitHub
GitHub Stars59
CategoryDevelopment
Updated8h ago
Forks5

Languages

Python

Security Score

95/100

Audited on Mar 31, 2026

No findings