SkyEyeGPT
[ISPRS2025] SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
Install / Use
/learn @ZhanYang-nwpu/SkyEyeGPTREADME
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
<br> <p align="center"> <img src="images/SkyEyeGPT.png" width="250"/> <p> <br> <div align="center"> <strong>Author: Yang Zhan, Zhitong Xiong, Yuan Yuan</strong><strong>School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University</strong>
</div>This is the official repository for paper "SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model". [paper] [SkyEye-968k]
🎉 Accepted by ISPRS Journal of Photogrammetry and Remote Sensing 🎉
Please share a <font color='orange'>STAR ⭐</font> if this project does help
You can focus on remote sensing multimodal large language model (Vision-Language) here
You can focus on multimodal large language model (Vision-Language) for UAV here
📢 Latest Updates
This is an ongoing project. We will be working on improving it.
- 📦 Chatbot, codebase, and model inference tutorial coming soon! 🚀
- May-13-2025: SkyEyeGPT model checkpoint is released. [huggingface] 🔥🔥 (The Model Weight can be run directly with MiniGPT-v2)
- Jan-19-2025: SkyEyeGPT paper is accepted by ISPRS. [paper] 🔥🔥
- Jun-12-2024: RS instruction dataset SkyEye-968k is released. [huggingface] 🔥🔥
- Jan-18-2024: paper is released. 🔥🔥
- Jan-17-2024: A curated list about remote sensing multimodal large language model (Vision-Language) is created. 🔥🔥
💬 SkyEyeGPT: Remote Sensing Multi-modal Chatbot
The online demo will be released.
<div align="center"> <img src="images/chatbot.png"/> </div>🚀 Inference
We release the model weight in [huggingface]! The Model Weight can be run directly with MiniGPT-v2
<img src="images/SkyEyeGPT.png" height="30"> SkyEyeGPT: Architecture
The model and checkpoint are coming soon! 🚀
<div align="center"> <img src="images/model.png"/> </div>🌋 SkyEye-968k: Unified RS Vision-Language Instruction
The download link of the unified remote sensing vision-language instruction dataset is here! 🚀
Download link: https://huggingface.co/datasets/ZhanYang-nwpu/SkyEye-968k
<div align="center"> <img src="images/dataset.png"/ height="400"> </div>📦 Performance
<div align="center"> <img src="images/performance.png"/ height="400"> </div>👁️ Visualization
1. Detailed description
<div align="center"> <img src="images/detailed_descr.png"/> </div>2. Some testing samples of captioning, grounding, and VQA
<div align="center"> <img src="images/some_sample.png"/> </div>👁️ Qualitative results
1. Remote Sensing Visual Grounding
<div align="center"> <img src="images/RSVG.png"/> </div>2. Remote Sensing Phrase Grounding
<div align="center"> <img src="images/RSPG.png"/> </div>3. Remote Sensing Image Captioning
<div align="center"> <img src="images/RSIC.png"/> </div>4. UAV Aerial Video Captioning
<div align="center"> <img src="images/UAVC.png"/> </div>5. Remote Sensing Visual Question Answering
<div align="center"> <img src="images/RSVQA.png"/> </div>6. Remote Sensing Referring Expression Generation
<div align="center"> <img src="images/RSREG.png"/> </div>7. Remote Sensing Scene Classification
<div align="center"> <img src="images/RSSC.png"/> </div>🔍 Quantitative results
1. Remote Sensing Image Captioning
<div align="center"> <img src="images/T_RSIC1.png"/> </div> <div align="center"> <img src="images/T_RSIC2.png"/> </div>2. UAV Aerial Video Captioning
<div align="center"> <img src="images/T_UAVC.png"/> </div>3. Remote Sensing Visual Grounding
<div align="center"> <img src="images/T_RSVG.png"/ height="250"> </div>4. Remote Sensing Visual Question Answering
<div align="center"> <img src="images/T_RSVQA1.png"/> </div> <div align="center"> <img src="images/T_RSVQA2.png"/ height="250"> </div>📜 Citation
@ARTICLE{zhan2025skyeyegpt,
title={SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model},
author={Yang Zhan and Zhitong Xiong and Yuan Yuan},
year={2025},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume = {221},
pages = {64-77}
}
🙏 Acknowledgement
Our code is based on MiniGPT-4, shikra, and MiniGPT-v2. We sincerely appreciate their contributions and authors for releasing source codes. We are thankful to EVA and LLaMA2 for releasing their models as open-source contributions. I would like to thank Xiong zhitong and Yuan yuan for helping the manuscript. I also thank the School of Artificial Intelligence, OPtics, and ElectroNics (iOPEN), Northwestern Polytechnical University for supporting this work.
🤖 Contact
If you have any questions about this project, please feel free to contact zhanyangnwpu@gmail.com.
Security Score
Audited on Apr 1, 2026
