<div align="center"> <h1>GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing</h1>

Ruizhe Ou<sup>1</sup> · Yuan Hu<sup>2,*</sup> · Fan Zhang<sup>2</sup> · Jiaxin Chen<sup>1</sup> · Yu Liu<sup>2,3</sup>

<sup>1</sup>Beijing University of Posts and Telecommunications · <sup>2</sup>Peking University · <sup>3</sup>Peking University Ordos Research Institute of Energy <sup>*</sup>corresponding authors

</div>

GeoPix is a new state-of-the-art pixel-level multi-modal large language model in remote sensing domain, supporting referring image segmentation and other tasks.

Releas🔥

[2025.04.11] We release the annotations of GeoPixInstruct. HuggingFace🤗
[2025.04.10] GeoPix has been accepted by GRSM (IEEE Geoscience and Remote Sensing Magazine).
[2025.02.20] We release the pre-trained checkpoints, inference code and gradio demo!
[2025.01.12] We release the paper.

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing [Arxiv]

Abstract

In this work, we propose GeoPix, a RS MLLM that extends image understanding capabilities to the pixel level. This is achieved by equipping the MLLM with a mask predictor, which transforms visual features from the vision encoder into masks conditioned on the LLM’s segmentation token embeddings. For more details, please refer to the paper.

Demo🚀

1. Installation

conda create -n geopix python=3.10 -y
conda activate geopix
pip install -r requirements.txt
mkdir pretrained_models

2. Download

You can directly download the model from Huggingface, ModelScope or OpenXLab. You also can download the model in python script:

# Huggingface
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Norman-ou/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

# ModelScope
from modelscope import snapshot_download
model_dir = snapshot_download("NormanOU/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

Once you have prepared all models, the folder tree should be like:

  .
  ├── ...
  ├── model
  ├── pretrained_models
  ├── app.py
  ├── engine.py
  ├── ...
  └── README.md

3. Start a local gradio demo

Run the following command:

python app.py

Thee instruction is well written. Enjoy our work.

Inference🔍

Run the following command:

python inference.py

Citation📑

@ARTICLE{10994415,
  author={Ou, Ruizhe and Hu, Yuan and Zhang, Fan and Chen, Jiaxin and Liu, Yu},
  journal={IEEE Geoscience and Remote Sensing Magazine}, 
  title={GeoPix: A multimodal large language model for pixel-level image understanding in remote sensing}, 
  year={2025},
  volume={},
  number={},
  pages={2-16},
  keywords={Visualization;Image segmentation;Training;Integrated circuit modeling;Grounding;Feature extraction;Accuracy;Remote sensing;Prototypes;Predictive models},
  doi={10.1109/MGRS.2025.3560293}
}

Acknowledgement

This work is built upon the LLaVA and PixelLM

GeoPix

Install / Use

README