SkillAgentSearch skills...

MaRS

AAAI26 | MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning

Install / Use

/learn @WanderRainy/MaRS
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1>MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning</h1> <h3>✨AAAI 2026✨</h3> <div> <a href='https://github.com/WanderRainy/' target='_blank'>Ruoyu Yang</a><sup>1</sup>&emsp; <a>Yinhe Liu</a><sup>✉1</sup>&emsp; <a>Heng Yan</a><sup>1</sup>&emsp; <a>Yiheng Zhou</a><sup>1</sup>&emsp; <a>Yihan Fu</a><sup>1</sup>&emsp; <a>Han Luo</a><sup>1</sup>&emsp; <a href='https://rsidea.whu.edu.cn/' target='_blank'>Yanfei Zhong</a><sup>✉1</sup>&emsp; </div> <div> <sup>1</sup>Wuhan University&emsp; </div> <div> <h4 align="center"> • <a href="https://rsidea.whu.edu.cn/mars.htm" target='_blank'>[Project]</a> • <a href="https://rsidea.whu.edu.cn/mars.pdf" target='_blank'>[paper]</a> • <a href="https://rsidea.whu.edu.cn" target='_blank'>[Research Group (RS-IDEA)]</a> • </h4> </div> <img src="https://github.com/user-attachments/assets/2be1af62-8b3d-439d-b23c-46e5af638569" width="100%"/> Overall framework of MaRS and examples of downstream tasks. </div>

📰 Latest News

  • Nov 2025 — MaRS paper accepted to AAAI 2026.
  • Nov 2025 — Pretraining code and model weights officially released.

📦 Overview

MaRS is a large-scale multi-modality foundation model designed for very-high-resolution remote sensing imagery.
It introduces Cross-Granularity Meta-Modality Learning, enabling robust representation learning across optical RGB and SAR modalities, at large spatial resolutions.

This repository provides:

  • Pretrained weights (mars_base, mars_large)
  • Pretraining pipeline (data processing, configuration, and scripts)
  • Instructions for loading MaRS using timm (compatible with SwinV2 architecture)

🔧 Using MaRS in Your Project

All pretrained weights are available at:
https://zenodo.org/records/17800805

MaRS follows the SwinV2 architecture and can be loaded directly using timm==1.0.15.

▶ Optical RGB Example

backbone_mars = timm.create_model(
    'swinv2_base_window8_256',
    pretrained=False,
    features_only=True,
    in_chans=3,
    img_size=512,
    checkpoint_path='mars_base_rgb_encoder_only.pth'
)

▶ SAR Example

backbone_mars = timm.create_model(
    'swinv2_base_window8_256',
    pretrained=False,
    features_only=True,
    in_chans=1,
    img_size=512,
    checkpoint_path='mars_base_sar_encoder_only.pth'
)

The pretrained backbone has been validated on a wide range of high-resolution optical and multi-modal downstream tasks (details in the paper).


🏗️ Pretraining Pipeline

This section describes how to reproduce MaRS pretraining.


1. Environment Setup

A minimal software environment used in our experiments:

python   = 3.11.13
torch    = 2.7.0
tifffile = 2025.3.30
timm     = 1.0.15

2. Data Preparation

The full MaRS-16M pretraining corpus (~5 TB) is too large for public hosting.
A public experimental subset will be released :https://zenodo.org/records/17800805.

<del>To request full dataset access for academic collaboration, please contact:</del>

yangruoyu@whu.edu.cn

Note: The dataset is currently under organization and is not publicly available. For collaboration inquiries, please feel free to contact us via email.

2.1 Download & Organize Raw Data

mkdir -p ./data
# Place Umbra / Capella raw tiles into ./data

2.2 Patch Extraction

Extract 1024 × 1024 training patches:

python ./data/split_patch.py

After extraction:

data/
├── Capella_patches/
│   ├── rgb/
│   └── sar/
└── Umbra_patches/
    ├── rgb/
    └── sar/
  • rgb/: optical patches
  • sar/: SAR patches

3. Launching Pretraining

Example commands for 8×GPU single-node training using torchrun.

3.1 MaRS-Base

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun \
    --nproc-per-node=8 \
    --nnodes=1 --node_rank=0 \
    --master_addr=localhost --master_port=12345 \
    main_pretrain.py \
    --model mars_base \
    --batch_size 16 \
    --num_workers 8 \
    --output_dir ./work_dirs/mars_base \
    --log_dir ./work_dirs/mars_base \
    --epochs 12 \
    --warmup_epochs 1

3.2 MaRS-Large

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun \
    --nproc-per-node=8 \
    --nnodes=1 --node_rank=0 \
    --master_addr=localhost --master_port=12345 \
    main_pretrain.py \
    --model mars_large \
    --batch_size 12 \
    --num_workers 8 \
    --output_dir ./work_dirs/mars_large \
    --log_dir ./work_dirs/mars_large \
    --epochs 12 \
    --warmup_epochs 1

4. Converting MaRS Weights to Swin Format

To make MaRS weights directly loadable by SwinTransformer (and timm), convert them via:

python utils/convert_mars_checkpoints_to_swin.py

The released weights have already undergone this conversion.


📕 Downtasks Dataset

GUSO:Multi-modality Paired High-resolution Remote Sensing Dataset. [Under review]

EarthMiss: Missing Modality Land Cover Mapping. Download

DFC25-T2: multimodal VHR dataset for all-weather disaster response. Download

SARDet-100k: SAR Modality Object Detection Dataset. Download

UBC-V2: Multi-modality High-resolution Remote Sensing Building Detection Dataset. Download

UBC: Multi-modality High-resolution Remote Sensing Building Height Estimation Dataset. Download

WHU-CD: High-resolution Remote Sensing Change Detection Dataset. Download

DeepGlobe: High-resolution Remote Sensing Road Extraction Dataset. Download


📖 Citation

If you find MaRS useful in your research, please cite:

@inproceedings{yang2026mars,
  title={MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning},
  author={Ruoyu Yang and Yinhe Liu and Heng Yan and Yiheng Zhou and Yihan Fu and Han Luo and Yanfei Zhong},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2026}
}

© Copyright & Usage

This method is copyrighted by the Intelligent Remote Sensing Data Extraction, Analysis and Application Research Group (RSIDEA)
http://rsidea.whu.edu.cn/
affiliated with the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University.

MaRS is released strictly for academic research purposes.


View on GitHub
GitHub Stars30
CategoryEducation
Updated1d ago
Forks0

Languages

Python

Security Score

75/100

Audited on Mar 26, 2026

No findings