SkillAgentSearch skills...

SkyEyeGPT

[ISPRS2025] SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

Install / Use

/learn @ZhanYang-nwpu/SkyEyeGPT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

<br> <p align="center"> <img src="images/SkyEyeGPT.png" width="250"/> <p> <br> <div align="center"> <strong>Author: Yang Zhan, Zhitong Xiong, Yuan Yuan</strong>

<strong>School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University</strong>

</div>

This is the official repository for paper "SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model". [paper] [SkyEye-968k]

🎉 Accepted by ISPRS Journal of Photogrammetry and Remote Sensing 🎉

Please share a <font color='orange'>STAR ⭐</font> if this project does help

You can focus on remote sensing multimodal large language model (Vision-Language) here

You can focus on multimodal large language model (Vision-Language) for UAV here

📢 Latest Updates

This is an ongoing project. We will be working on improving it.

  • 📦 Chatbot, codebase, and model inference tutorial coming soon! 🚀
  • May-13-2025: SkyEyeGPT model checkpoint is released. [huggingface] 🔥🔥 (The Model Weight can be run directly with MiniGPT-v2
  • Jan-19-2025: SkyEyeGPT paper is accepted by ISPRS. [paper] 🔥🔥
  • Jun-12-2024: RS instruction dataset SkyEye-968k is released. [huggingface] 🔥🔥
  • Jan-18-2024: paper is released. 🔥🔥
  • Jan-17-2024: A curated list about remote sensing multimodal large language model (Vision-Language) is created. 🔥🔥

💬 SkyEyeGPT: Remote Sensing Multi-modal Chatbot

The online demo will be released.

<div align="center"> <img src="images/chatbot.png"/> </div>

🚀 Inference

We release the model weight in [huggingface]! The Model Weight can be run directly with MiniGPT-v2

<img src="images/SkyEyeGPT.png" height="30"> SkyEyeGPT: Architecture

The model and checkpoint are coming soon! 🚀

<div align="center"> <img src="images/model.png"/> </div>

🌋 SkyEye-968k: Unified RS Vision-Language Instruction

The download link of the unified remote sensing vision-language instruction dataset is here! 🚀

Download link: https://huggingface.co/datasets/ZhanYang-nwpu/SkyEye-968k

<div align="center"> <img src="images/dataset.png"/ height="400"> </div>

📦 Performance

<div align="center"> <img src="images/performance.png"/ height="400"> </div>

👁️ Visualization

1. Detailed description

<div align="center"> <img src="images/detailed_descr.png"/> </div>

2. Some testing samples of captioning, grounding, and VQA

<div align="center"> <img src="images/some_sample.png"/> </div>

👁️ Qualitative results

1. Remote Sensing Visual Grounding

<div align="center"> <img src="images/RSVG.png"/> </div>

2. Remote Sensing Phrase Grounding

<div align="center"> <img src="images/RSPG.png"/> </div>

3. Remote Sensing Image Captioning

<div align="center"> <img src="images/RSIC.png"/> </div>

4. UAV Aerial Video Captioning

<div align="center"> <img src="images/UAVC.png"/> </div>

5. Remote Sensing Visual Question Answering

<div align="center"> <img src="images/RSVQA.png"/> </div>

6. Remote Sensing Referring Expression Generation

<div align="center"> <img src="images/RSREG.png"/> </div>

7. Remote Sensing Scene Classification

<div align="center"> <img src="images/RSSC.png"/> </div>

🔍 Quantitative results

1. Remote Sensing Image Captioning

<div align="center"> <img src="images/T_RSIC1.png"/> </div> <div align="center"> <img src="images/T_RSIC2.png"/> </div>

2. UAV Aerial Video Captioning

<div align="center"> <img src="images/T_UAVC.png"/> </div>

3. Remote Sensing Visual Grounding

<div align="center"> <img src="images/T_RSVG.png"/ height="250"> </div>

4. Remote Sensing Visual Question Answering

<div align="center"> <img src="images/T_RSVQA1.png"/> </div> <div align="center"> <img src="images/T_RSVQA2.png"/ height="250"> </div>

📜 Citation

@ARTICLE{zhan2025skyeyegpt,
      title={SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model}, 
      author={Yang Zhan and Zhitong Xiong and Yuan Yuan},
      year={2025},
      journal={ISPRS Journal of Photogrammetry and Remote Sensing},
      volume = {221},
      pages = {64-77}
}

🙏 Acknowledgement

Our code is based on MiniGPT-4, shikra, and MiniGPT-v2. We sincerely appreciate their contributions and authors for releasing source codes. We are thankful to EVA and LLaMA2 for releasing their models as open-source contributions. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.

🤖 Contact

If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.

View on GitHub
GitHub Stars126
CategoryDevelopment
Updated7d ago
Forks7

Security Score

80/100

Audited on Apr 1, 2026

No findings