LowLevelBanana

[Technical Report] A Comprehensive Evaluation of Nano Banana Pro on 14 Low-Level Vision Tasks and 40 Datasets

Generate Convert Improve

Install / Use

/learn @Zplusdragon/LowLevelBanana

About this skill

Quality Score

0/100

README

Is Nano Banana Pro a Low-Level Vision All-Rounder? 🍌

Jialong Zuo, Haoyou Deng, Hanyu Zhou, Jiaxin Zhu, Yicheng Zhang, Yiwei Zhang, Yongxin Yan, Kaixing Huang, Weisen Chen, Yongtai Deng, Rui Jin, Nong Sang, Changxin Gao

School of Artificial Intelligence and Automation, Huazhong University of Science and Technology (HUST)

📢 Introduction

This repository hosts the official resources for the technical report: "Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets.".

While commercial T2I models like Nano Banana Pro excel in creative synthesis, their potential as generalist solvers for traditional low-level vision challenges remains largely underexplored. In this study, we investigate the critical question: Is Nano Banana Pro a Low-Level Vision All-Rounder? We conducted a comprehensive zero-shot evaluation across 14 distinct low-level tasks spanning 40 diverse datasets.

<img src="assets/intro.png" alt="Teaser Image" width="100%"> Figure 1: Exemplary zero-shot results of Nano Banana Pro across 14 low-level vision tasks.

🔥 Key Highlights

Massive Benchmark: Evaluated on 14 low-level vision tasks and 40 datasets.
Zero-Shot Setting: Utilized simple textual prompts without any fine-tuning.
The Dichotomy Discovery: We reveal a distinct performance dichotomy:
- ✅ Superior Subjective Quality: Often hallucinates plausible high-frequency details that surpass specialist models.
- ❌ Lower Reference-Based Metrics: Lags behind in PSNR/SSIM due to the inherent stochasticity of generative models.

📊 Evaluation Results

Detailed quantitative and qualitative comparisons can be found in our project page and full report.

Our extensive analysis identifies Nano Banana Pro as a capable zero-shot contender for low-level vision tasks. While it struggles to maintain the strict pixel-level consistency required by conventional metrics (PSNR/SSIM), it offers superior visual quality, suggesting a need for new perception-aligned evaluation paradigms.

We have released the evaluation datasets and corresponding inference results of Nano Banana Pro used in our study on HuggingFace to facilitate future research.

Download the Inference Results on HuggingFace

💻 Evaluation Code

After downloading the inference results of Nano Banana Pro for each dataset from HuggingFace, you can use the evaluation code provided for each task to obtain quantitative results. Please refer to the eval folder.

🔗 Citation

If you find this work helpful for your research, please consider citing:

@misc{zuo2025nanobananaprolowlevel,
      title={Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets}, 
      author={Jialong Zuo and Haoyou Deng and Hanyu Zhou and Jiaxin Zhu and Yicheng Zhang and Yiwei Zhang and Yongxin Yan and Kaixing Huang and Weisen Chen and Yongtai Deng and Rui Jin and Nong Sang and Changxin Gao},
      year={2025},
      eprint={2512.15110},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.15110}, 
}

Related Skills

node-connect

351.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。