MaRS
AAAI26 | MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning
Install / Use
/learn @WanderRainy/MaRSREADME
📰 Latest News
- Nov 2025 — MaRS paper accepted to AAAI 2026.
- Nov 2025 — Pretraining code and model weights officially released.
📦 Overview
MaRS is a large-scale multi-modality foundation model designed for very-high-resolution remote sensing imagery.
It introduces Cross-Granularity Meta-Modality Learning, enabling robust representation learning across optical RGB and SAR modalities, at large spatial resolutions.
This repository provides:
- Pretrained weights (
mars_base,mars_large) - Pretraining pipeline (data processing, configuration, and scripts)
- Instructions for loading MaRS using timm (compatible with SwinV2 architecture)
🔧 Using MaRS in Your Project
All pretrained weights are available at:
https://zenodo.org/records/17800805
MaRS follows the SwinV2 architecture and can be loaded directly using timm==1.0.15.
▶ Optical RGB Example
backbone_mars = timm.create_model(
'swinv2_base_window8_256',
pretrained=False,
features_only=True,
in_chans=3,
img_size=512,
checkpoint_path='mars_base_rgb_encoder_only.pth'
)
▶ SAR Example
backbone_mars = timm.create_model(
'swinv2_base_window8_256',
pretrained=False,
features_only=True,
in_chans=1,
img_size=512,
checkpoint_path='mars_base_sar_encoder_only.pth'
)
The pretrained backbone has been validated on a wide range of high-resolution optical and multi-modal downstream tasks (details in the paper).
🏗️ Pretraining Pipeline
This section describes how to reproduce MaRS pretraining.
1. Environment Setup
A minimal software environment used in our experiments:
python = 3.11.13
torch = 2.7.0
tifffile = 2025.3.30
timm = 1.0.15
2. Data Preparation
The full MaRS-16M pretraining corpus (~5 TB) is too large for public hosting.
A public experimental subset will be released :https://zenodo.org/records/17800805.
<del>To request full dataset access for academic collaboration, please contact:</del>
yangruoyu@whu.edu.cn
Note: The dataset is currently under organization and is not publicly available. For collaboration inquiries, please feel free to contact us via email.
2.1 Download & Organize Raw Data
mkdir -p ./data
# Place Umbra / Capella raw tiles into ./data
2.2 Patch Extraction
Extract 1024 × 1024 training patches:
python ./data/split_patch.py
After extraction:
data/
├── Capella_patches/
│ ├── rgb/
│ └── sar/
└── Umbra_patches/
├── rgb/
└── sar/
rgb/: optical patchessar/: SAR patches
3. Launching Pretraining
Example commands for 8×GPU single-node training using torchrun.
3.1 MaRS-Base
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun \
--nproc-per-node=8 \
--nnodes=1 --node_rank=0 \
--master_addr=localhost --master_port=12345 \
main_pretrain.py \
--model mars_base \
--batch_size 16 \
--num_workers 8 \
--output_dir ./work_dirs/mars_base \
--log_dir ./work_dirs/mars_base \
--epochs 12 \
--warmup_epochs 1
3.2 MaRS-Large
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun \
--nproc-per-node=8 \
--nnodes=1 --node_rank=0 \
--master_addr=localhost --master_port=12345 \
main_pretrain.py \
--model mars_large \
--batch_size 12 \
--num_workers 8 \
--output_dir ./work_dirs/mars_large \
--log_dir ./work_dirs/mars_large \
--epochs 12 \
--warmup_epochs 1
4. Converting MaRS Weights to Swin Format
To make MaRS weights directly loadable by SwinTransformer (and timm), convert them via:
python utils/convert_mars_checkpoints_to_swin.py
The released weights have already undergone this conversion.
📕 Downtasks Dataset
GUSO:Multi-modality Paired High-resolution Remote Sensing Dataset. [Under review]
EarthMiss: Missing Modality Land Cover Mapping. Download
DFC25-T2: multimodal VHR dataset for all-weather disaster response. Download
SARDet-100k: SAR Modality Object Detection Dataset. Download
UBC-V2: Multi-modality High-resolution Remote Sensing Building Detection Dataset. Download
UBC: Multi-modality High-resolution Remote Sensing Building Height Estimation Dataset. Download
WHU-CD: High-resolution Remote Sensing Change Detection Dataset. Download
DeepGlobe: High-resolution Remote Sensing Road Extraction Dataset. Download
📖 Citation
If you find MaRS useful in your research, please cite:
@inproceedings{yang2026mars,
title={MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning},
author={Ruoyu Yang and Yinhe Liu and Heng Yan and Yiheng Zhou and Yihan Fu and Han Luo and Yanfei Zhong},
booktitle={AAAI Conference on Artificial Intelligence},
year={2026}
}
© Copyright & Usage
This method is copyrighted by the Intelligent Remote Sensing Data Extraction, Analysis and Application Research Group (RSIDEA)
http://rsidea.whu.edu.cn/
affiliated with the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University.
MaRS is released strictly for academic research purposes.
