SwinFusion
This is official Pytorch implementation of "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer"
Install / Use
/learn @Linfeng-Tang/SwinFusionREADME
SwinFusion
This is official Pytorch implementation of "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer"
✨ News
-
[2026-02-21] Our paper VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion has been accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)! [Paper] [Code]
-
[2025-09-18] Our paper ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts has been officially accepted by Advances in Neural Information Processing Systems (NeurIPS 2025)! [Paper] [Code]
-
[2025-09-10] Our paper Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion has been officially accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)! [Paper] [Code]
-
[2025-03-15] Our paper C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning has been officially accepted by the International Journal of Computer Vision (IJCV)! [Paper] [Code]
-
[2025-02-11] We released a large-scale dataset for infrared and visible video fusion: M3SVD: Multi-Modal Multi-Scene Video Dataset.
Image Fusion Example
Schematic illustration of multi-modal image fusion and digital photography image fusion. First row: source image pairs, second row: fused results of U2Fusion and our SwinFusion.
Framework
The framework of the proposed SwinFusion for multi-modal image fusion and digital photography image fusion.
Visible and Infrared Image Fusion (VIF)
To Train
Download the training dataset from MSRS dataset, and put it in ./Dataset/trainsets/MSRS/.
python -m torch.distributed.launch --nproc_per_node=3 --master_port=1234 main_train_swinfusion.py --opt options/swinir/train_swinfusion_vif.json --dist True
To Test
Download the test dataset from MSRS dataset, and put it in ./Dataset/testsets/MSRS/.
python test_swinfusion.py --model_path=./Model/Infrared_Visible_Fusion/Infrared_Visible_Fusion/models/ --iter_number=10000 --dataset=MSRS --A_dir=IR --B_dir=VI_Y
Visual Comparison
Qualitative comparison of SwinFusion with five state-of-the-art methods on visible and infrared image fusion. From left to right: infrared image, visible
image, and the results of GTF, DenseFuse, IFCNN SDNet, U2Fusion, and our SwinFusion.
Visible and Nir-infrared Image Fusion (VIS-NIR)
To Train
Download the training dataset from VIS-NIR Scene dataset, and put it in ./Dataset/trainsets/Nirscene/.
python -m torch.distributed.launch --nproc_per_node=3 --master_port=1234 main_train_swinfusion.py --opt options/swinir/train_swinfusion_nir.json --dist True
To Test
Download the test dataset from VIS-NIR Scene dataset, and put it in ./Dataset/testsets/Nirscene/.
python test_swinfusion.py --model_path=./Model/RGB_NIR_Fusion/RGB_NIR_Fusion/models/ --iter_number=10000 --dataset=NirScene --A_dir=NIR --B_dir=VI_Y
Visual Comparison
Qualitative comparison of SwinFusion with five state-of-the-art methods on visible and near-infrared image fusion. From left to right: near-infrared
image, visible image, and the results of ANVF, DenseFuse, IFCNN, SDNet, U2Fusion, and our SwinFusion.
Medical Image Fusion (Med)
To Train
Download the training dataset from Harvard medical dataset, and put it in ./Dataset/trainsets/PET-MRI/ or ./Dataset/trainsets/CT-MRI/.
python -m torch.distributed.launch --nproc_per_node=3 --master_port=1234 main_train_swinfusion.py --opt options/swinir/train_swinfusion_med.json --dist True
To Test
Download the training dataset from Harvard medical dataset, and put it in ./Dataset/testsets/PET-MRI/ or ./Dataset/testsets/CT-MRI/.
python test_swinfusion.py --model_path=./Model/Medical_Fusion-PET-MRI/Medical_Fusion/models/ --iter_number=10000 --dataset=NirScene --A_dir=MRI --B_dir=PET_Y
or
python test_swinfusion.py --model_path=./Model/Medical_Fusion-CT-MRI/Medical_Fusion/models/ --iter_number=10000 --dataset=CT-MRI--A_dir=MRI --B_dir=CT
Visual Comparison
Qualitative comparison of SwinFusion with five state-of-the-art methods on PET and MRI image fusion. From left to right: MRI image, PET image,
and the results of CSMCA, DDcGAN, IFCNN, SDNet, U2Fusion, and our SwinFusion.
Qualitative comparison of SwinFusion with five state-of-the-art methods on CT and MRI image fusion. From left to right: MRI image, CT image, and
the results of CSMCA, DDcGAN, IFCNN, SDNet, U2Fusion, and our SwinFusion.
Multi-Exposure Image Fusion (MEF)
To Train
Download the training dataset from MEF dataset, and put it in ./Dataset/trainsets/MEF.
python -m torch.distributed.launch --nproc_per_node=3 --master_port=1234 main_train_swinfusion.py --opt options/swinir/train_swinfusion_mef.json --dist True
To Test
Download the training dataset from MEF Benchmark dataset, and put it in ./Dataset/testsets/MEF_Benchmark.
python test_swinfusion.py --model_path=./Model/Multi_Exposure_Fusion/Multi_Exposure_Fusion/models/ --iter_number=10000 --dataset=MEF_Benchmark --A_dir=under_Y --B_dir=over_Y
Visual Comparison
Qualitative results of multi-exposure image fusion. From left to right: under-exposed image, over-exposed image, and the results of SPD-MEF,
MEF-GAN, IFCNN SDNet, U2Fusion, and our SwinFusion.
Multi-Focus Image Fusion (MFF)
To Train
Download the training dataset from MFI-WHU dataset, and put it in ./Dataset/trainsets/MEF.
python -m torch.distributed.launch --nproc_per_node=3 --master_port=1234 main_train_swinfusion.py --opt options/swinir/train_swinfusion_mff.json --dist True
To Test
Download the training dataset from Lytro dataset, and put it in ./Dataset/trainsets/Lytro.
python test_swinfusion.py --model_path=./Model/Multi_Focus_Fusion/Multi_Focus_Fusion/models/ --iter_number=10000 --dataset=Lytro --A_dir=A_Y --B_dir=B_Y
Visual Comparison
Qualitative results of multi-focus image fusion. From left to right: near/far-focus image, the fused results and difference maps of SFMD, DRPL,
MFF-GAN, IFCNN, SDNet, U2Fusion, and our SwinFusion. The difference maps represent the difference between the near-focus image and fused results.
Recommended Environment
- [x] torch 1.11.0
- [x] torchvision 0.12.0
- [x] tensorboard 2.7.0
- [x] numpy 1.21.2
Citation
@article{Tang2024Mask-DiFuser,
author={Tang, Linfeng and Li, Chunyu and Ma, Jiayi},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion},
year={2025},
volume={},
number={},
pages={1-18},
}
@article{Tang2024C2RF,
title={C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning},
author={Tang, Linfeng and Yan, Qinglong and Xiang, Xinyu and Fang, Leyuan and Ma, Jiayi},
journal={International Journal of Computer Vision},
pages={5262--5280},
volume={133},
year={2025},
}
@article{Ma2022SwinFusion,
author={Ma, Jiayi and Tang, Linfeng and Fan, Fan and Huang, Jun and Mei, Xiaoguang and Ma, Yong},
journal={IEEE/CAA Journal of Automatica Sinica},
title={SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer},
year={2022},
volume={9},
number={7},
pages={1200-1217}
}
