SkillAgentSearch skills...

U3M

[Pattern Recognition 2025 🌟]Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Install / Use

/learn @LiBingyu01/U3M
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center">

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

</div> </div>

πŸ’¬ Introduction

Multimodal semantic segmentation is a pivotal com-ponent of computer vision, which often outperforms unimodal methods by harnessing a richer set of information from diverse sources. Existing models often employ modality-specific designs that inherently introduce biases toward certain modalities. While these biases may be beneficial in specific contexts, they often compromise the model’s adaptability across various multimodal scenarios, potentially degrading performance. To address this problem, we turn to the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion, and propose U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specificially, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, validating its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings.

πŸš€ Updates

  • [x] 05/2024: init repository and release the code.
  • [x] 05/2024: release U3M model weights. Download from GoogleDrive.

πŸ‘οΈ U3M model

<div align="center">

U3M Figure: Overall architecture of U3M model.

</div>

πŸ” Environment

First, create and activate the environment using the following commands:

conda env create -f environment.yaml
conda activate U3M

πŸ“¦ Data preparation

Download the dataset:

  • MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.
  • FMB, for FMB dataset with RGB-Infrared modalities.

Then, put the dataset under data directory as follows:

data/
β”œβ”€β”€ MCubeS
β”‚Β Β  β”œβ”€β”€ polL_color
β”‚Β Β  β”œβ”€β”€ polL_aolp_sin
β”‚Β Β  β”œβ”€β”€ polL_aolp_cos
β”‚Β Β  β”œβ”€β”€ polL_dolp
β”‚Β Β  β”œβ”€β”€ NIR_warped
β”‚Β Β  β”œβ”€β”€ NIR_warped_mask
β”‚Β Β  β”œβ”€β”€ GT
β”‚Β Β  β”œβ”€β”€ SSGT4MS
β”‚Β Β  β”œβ”€β”€ list_folder
β”‚Β Β  └── SS
β”œβ”€β”€ FMB
β”‚Β Β  β”œβ”€β”€ test
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ color
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ Infrared
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ Label
β”‚Β Β  β”‚Β Β  └── Visible
β”‚Β Β  β”œβ”€β”€ train
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ color
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ Infrared
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ Label
β”‚Β Β  β”‚Β Β  └── Visible

πŸ“¦ Model Zoo

MCubeS

| Model-Modal | mIoU | weight | | :--------------- | :----- | :----- | | MCubeS-RGB | 49.22 | GoogleDrive | | MCubeS-RGB-A | 49.89 | GoogleDrive | | MCubeS-RGB-A-D | 50.26 | GoogleDrive | | MCubeS-RGB-A-D-N | 51.69 | GoogleDrive |

MCubeS_Ablation_RGBADN

| Model-Modal | mIoU | weight | | :--------------- | :----- | :----- | | with_Linear | 49.89 | GoogleDrive | | with_ChannelAntention | 50.34 | GoogleDrive | | with_PSPNet | 50.62 | GoogleDrive | | with_ALL | 51.69 | GoogleDrive |

FMB

| Model-Modal | mIoU | weight | | :--------------- | :----- | :----- | | FMB-RGB | 57.17 | GoogleDrive | | FMB-RGB-Infrared | 60.76 | GoogleDrive |

πŸ‘οΈ Visulization

Please refer to the visulization_mm.py for the segmentation results and visulization_tsne.py for T-sne visulization.

<div align="center">

segresult_FMB Figure: segresult_FMB.

segresult_MCUBES Figure: segresult_MCUBES.

tsne Figure: T-sne visulization.

</div> ## Training

Before training, please download pre-trained SegFormer, and put it in the correct directory following this structure:

checkpoints/pretrained/segformer
β”œβ”€β”€ mit_b0.pth
β”œβ”€β”€ mit_b1.pth
β”œβ”€β”€ mit_b2.pth
β”œβ”€β”€ mit_b3.pth
└── mit_b4.pth

To train U3M model, please update the appropriate configuration file in configs/ with appropriate paths and hyper-parameters. Then run as follows:

cd path/to/U3M
conda activate U3M

python -m tools.train_mm --cfg configs/mcubes_rgbadn.yaml

python -m tools.train_mm --cfg configs/fmb_rgbt.yaml

Evaluation

To evaluate U3M models, please download respective model weights (GoogleDrive) and save them under any folder you like.

<!-- ```text output/ β”œβ”€β”€ MCubeS β”‚Β Β  β”œβ”€β”€ U3M_B4_MCubeS_RGB.pth β”‚Β Β  β”œβ”€β”€ U3M_B4_MCubeS_RGBA.pth β”‚Β Β  β”œβ”€β”€ U3M_B4_MCubeS_RGBAD.pth β”‚Β Β  β”œβ”€β”€ U3M_B4_MCubeS_RGBNAD.pth ``` -->

Then, update the EVAL section of the appropriate configuration file in configs/ and run:

cd path/to/U3M
conda activate U3M

python -m tools.val_mm --cfg configs/mcubes_rgbadn.yaml

python -m tools.val_mm --cfg configs/fmb_rgbt.yaml

🚩 License

This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.

πŸ“œ Citations

If you use U3M model, please cite the following work:

@article{li2024u3m,
  title={U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation},
  author={Li, Bingyu and Zhang, Da and Zhao, Zhiyuan and Gao, Junyu and Li, Xuelong},
  journal={arXiv preprint arXiv:2405.15365},
  year={2024}
}

πŸ”ˆ Acknowledgements

Our codebase is based on the following Github repositories. Thanks to the following public repositories:

Note: This is a research level repository and might contain issues/bugs. Please contact the authors for any query.

View on GitHub
GitHub Stars10
CategoryDevelopment
Updated17d ago
Forks3

Languages

Python

Security Score

75/100

Audited on Mar 16, 2026

No findings