SkillAgentSearch skills...

4KAgent

[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!

Install / Use

/learn @taco-group/4KAgent

README

<div align="center"> <h1>4KAgent: Agentic Any Image to 4K Super-Resolution</h1> <div> <a href='https://yushenzuo.github.io' target='_blank'>Yushen Zuo<sup>1</sup></a>&emsp; Qi Zheng<sup>1†</sup>&emsp; Mingyang Wu<sup>1†</sup>&emsp; Xinrui Jiang<sup>2†</sup>&emsp; <a href='https://shadowiterator.github.io' target='_blank'>Renjie Li<sup>1</sup></a>&emsp;<br> <a href='https://jianwang-cmu.github.io' target='_blank'>Jian Wang<sup>3</sup></a>&emsp; <a href='https://yzhang34.github.io/author/yide-zhang' target='_blank'>Yide Zhang<sup>4</sup></a>&emsp; <a href='https://gengchenmai.github.io' target='_blank'>Gengchen Mai<sup>5</sup></a>&emsp; <a href='https://coilab.caltech.edu/members/directors-biography' target='_blank'>Lihong V. Wang<sup>6</sup></a>&emsp; <a href='https://www.james-zou.com' target='_blank'>James Zou<sup>2</sup></a>&emsp;<br> <a href='https://www.xiaoyumu.com' target='_blank'>Xiaoyu Wang<sup>7</sup></a>&emsp; <a href='https://faculty.ucmerced.edu/mhyang' target='_blank'>Ming-Hsuan Yang<sup>8</sup></a>&emsp; <a href='https://vztu.github.io' target='_blank'>Zhengzhong Tu<sup>1*</sup></a> </div> <br> <div> <sup>1</sup>Texas A&M University&emsp; <sup>2</sup>Stanford University&emsp; <sup>3</sup>Snap Inc.&emsp; <sup>4</sup>CU Boulder<br> <sup>5</sup>UT Austin&emsp; <sup>6</sup>California Institute of Technology&emsp; <sup>7</sup>Topaz Labs&emsp; <sup>8</sup>UC Merced<br> <sup>†</sup>Indicates Equal Contribution<br> <sup>*</sup>Corresponding Author </div> <!-- [[paper]](https://arxiv.org/abs/2507.07105) --> <br> </div> <div align="center">

  arXiv  🤗 Dataset visitors


</div> <p align="center"> <strong><em>Accepted by NeurIPS 2025</em></strong> </p> <p align="center"> <img src="./assets/4KAgent_Teaser.jpg" width=95%> <p>

Introduction

We present 4KAgent, an agentic image super-resolution generalist designed to universally upscale any image to 4K resolution, regardless of input type, degradation level, or domain. 4KAgent offers these key features:

  • 🔥 Framework: 4KAgent is the first AI agent framework for universal any-image-to-4K upscaling, capable of handling all image categories, ranging from classical and realistic degradations, extreme low-quality inputs, AI-generated imagery, to scientific imaging tasks such as remote sensing, microscopy, and biomedical inputs.

  • 🔥 System Design: A multi-agent system in 4KAgent, the Perception Agent employs large vision-language models (VLMs) to analyze the content and distortion in the image and provide the restoration plan for the restoration agent to execute. The Restoration Agent, which sets up an execution—reflection—rollback procedure for recursive restoration and upscaling.

  • 🔥 Q-MoE & Face Restoration pipeline: In each restoration step of the restoration plan, we propose a Quality-Driven Mixture-of-Expert (Q-MoE) policy in execution and reflection to select the optimal image. We further develop a face restoration pipeline to enhance faces in images.

  • 🔥 Profile Module: To expand the applicability of 4KAgent, we propose a Profile Module to bring the availability to customize the system for different restoration tasks. 4KAgent can adapt to different restoration tasks without extra training.

  • 🔥 DIV4K-50 Dataset: We build the DIV4K-50 dataset as a challenging testset to upscale a low-quality (LQ) image in 256 × 256 resolution with multiple degradations to a high-quality (HQ) 4K image in 4096 × 4096 resolution.

Pipeline

<p align="center"> <img src="./assets/4KAgent_framework.png" width=95%> <p>

Dependencies and Installation

Please refer to the Installation Guide for detailed instructions on setting up the environment and installing dependencies.

Inference

Prerequest: Before running 4KAgent, please fill in the API key in config file

The inference of 4KAgent relies on profile, we present examples here:

Profiles use 'llama_vision' as the VLM in perception agent:

Classic SR (ExpSR_s4_F)

CUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \
  --input_dir ./assets/profile_test_example/classicsr \
  --output_dir ./outputs/4KAgent_test/classicsr \
  --profile_name ExpSR_s4_F \
  --tool_run_gpu_id 2

Real-World SR (ExpSR_s4_P)

CUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \
  --input_dir ./assets/profile_test_example/realworldsr \
  --output_dir ./outputs/4KAgent_test/realworldsr \
  --profile_name ExpSR_s4_P \
  --tool_run_gpu_id 2

Profiles use 'depictqa' as the VLM in perception agent:

Joint IR and 4K SR:

# Set up depictqa in portal A:
cd ./DepictQA
conda activate depictqa
CUDA_VISIBLE_DEVICES=0 python src/app_eval.py

# 4KAgent inference in portal B:
CUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \
  --input_dir ./assets/profile_test_example/4ksr \
  --output_dir ./outputs/4KAgent_test/4ksr \
  --profile_name FastGen4K_P \
  --tool_run_gpu_id 2

We recommend the FastGen4K_P profile, which infers faster and has good perceptual quality.

tool_run_gpu_id is used to specify the GPU to execute tools (restoration methods). For GPUs with larger VRAM, tool_run_gpu_id can be set as the same as CUDA_VISIBLE_DEVICES.

Old Photo 4K SR

# Set up depictqa in portal A:
cd ./DepictQA
conda activate depictqa
CUDA_VISIBLE_DEVICES=0 python src/app_eval.py

# 4KAgent inference in portal B:
CUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \
  --input_dir ./assets/profile_test_example/opr \
  --output_dir ./outputs/4KAgent_test/opr \
  --profile_name OldP4K_P \
  --tool_run_gpu_id 2

Multiple Degradation Image Restoration

# Set up depictqa in portal A:
cd ./DepictQA
conda activate depictqa
CUDA_VISIBLE_DEVICES=0 python src/app_eval.py

# 4KAgent inference in portal B:
CUDA_VISIBLE_DEVICES=1 python infer_4kagent.py \
  --input_dir ./assets/profile_test_example/mir \
  --output_dir ./outputs/4KAgent_test/mir \
  --profile_name GenMIR_P \
  --tool_run_gpu_id 2

Profile Setting

We provide several example profiles in the pipeline/profiles as references for different use cases. Users can customize their own profiles based on these examples.

DIV4K-50 Dataset

We provide the DIV4K-50 dataset on 🤗 Hugging Face for easy access and reproducibility. To download the dataset, please ensure you have the huggingface_hub CLI installed:

python -m pip install "huggingface_hub[cli]"

# run the following command to download the dataset to your local directory:
huggingface-cli download --repo-type dataset YSZuo/DIV4K-50 --local-dir ./dataset/DIV4K-50

# unzip the dataset:
cd ./dataset/DIV4K-50
unzip DIV4K-50.zip

Useful Tools

[1] Extract result images: utils/image_export.py

Currently, 4KAgent will generate a folder which contains logs, images in inference. If we only need the final output image for calculating metrics (e.g., PSNR / SSIM / LPIPS / ...), we can use this script to extract every output image into a new folder with their original image name.

[2] Extract result toolchain: utils/toolchain_export.py

If we run multiple images and we want to know the tool-chain of 4KAgent for each image, we can use this script to extract every tool-chain of each image. For example,

001: defocus deblurring@diffplugin-brightening@gamma_correction-super-resolution@diffbir.
002: defocus deblurring@drbnet-super-resolution@diffbir.
003: defocus deblurring@restormer-super-resolution@pisasr.

[3] Extract result tool for face restoration: utils/face_restoration_tool_export.py

If we activate face restoration in the profile (set FaceRestore to true) and want to see which face restoration method is used, we can use this script. For example,

00006_01: codeformer
00006_02: gfpgan
00006_03: img

img means the original face.

Evaluation

We have multiple evaluation scripts in eval folder, which corresponding to different tasks:

[1] test_metrics_classic: crop_border=4, Used to evaluate images in Classic SR task. (Set5, Set14, B100, Urban100, Manga109)

[2] test_metrics: Used to evaluate images in Real-World SR task. (RealSR, DRealSR)

[3] test_metrics_mio: Used to evaluate images in Multi-Degradation Restoration task. (MiO100)

[4] test_metrics_nr: Used to evaluate images with non-reference metrics (NIQE, MUSIQ, MANIQA (pipal), CLIPIQA). (RealSRSet (16x SR), DIV4K-50) We can also use test_metrics_nr_low_gpu if the VRAM of GPU is limited (<24G).

Experiment Results

We evaluate 4KAgent on 11 different image SR tasks. The overall experiment results are summarized as follows: | Task | Dataset | Profile(s) | Scale Factor | Result | |-------------------------------|-------------------|-------------------------------------------------|--------------|--------| | Classical SR | Set5 | ExpSR-s4-F, ExpSR-s4-P, GenSR-s4-P | 4 | Result | | Classical SR | Set14

View on GitHub
GitHub Stars769
CategoryDevelopment
Updated12h ago
Forks41

Languages

Python

Security Score

100/100

Audited on Mar 23, 2026

No findings