SkillAgentSearch skills...

StableSR

[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution

Install / Use

/learn @IceClear/StableSR
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="https://user-images.githubusercontent.com/22350795/236680126-0b1cdd62-d6fc-4620-b998-75ed6c31bf6f.png" height=40> </p>

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Paper | Project Page | Video | WebUI | ModelScope | ComfyUI

<a href="https://colab.research.google.com/drive/11SE2_oDvbYtcuHDbaLAxsKk_o3flsO1T?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a> Hugging Face Replicate OpenXLab visitors

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C.K. Chan, Chen Change Loy

S-Lab, Nanyang Technological University

<img src="assets/network.png" width="800px"/>

:star: If StableSR is helpful to your images or projects, please help star this repo. Thanks! :hugs:

Update

  • 2024.06.28: Accepted by IJCV. See the latest Full paper.

  • 2024.02.29: Support StableSR with SD-Turbo. Thank Andray for the finding!

    Now the ComfyUI GitHub Stars of StableSR is also available. Thank gameltb and WSJUSA for the implementation!

  • 2023.11.30: Code Update.

    • Support DDIM and negative prompts
    • Add CFW training scripts
    • Add FaceSR training and test scripts
  • 2023.10.08: Our test sets associated with the results in our paper are now available at [HuggingFace] and [OpenXLab]. You may have an easy comparison with StableSR now.

  • 2023.08.19: Integrated to :hugs: Hugging Face. Try out online demo! Hugging Face.

  • 2023.08.19: Integrated to :panda_face: OpenXLab. Try out online demo! OpenXLab.

  • 2023.07.31: Integrated to :rocket: Replicate. Try out online demo! Replicate Thank Chenxi for the implementation!

  • 2023.07.16: You may reproduce the LDM baseline used in our paper using LDM-SRtuning GitHub Stars.

  • 2023.07.14: :whale: ModelScope for StableSR is released!

  • 2023.06.30: :whale: New model trained on SD-2.1-768v is released! Better performance with fewer artifacts!

  • 2023.06.28: Support training on SD-2.1-768v.

  • 2023.05.22: :whale: Improve the code to save more GPU memory, now 128 --> 512 needs 8.9G. Enable start from intermediate steps.

  • 2023.05.20: :whale: The WebUI GitHub Stars of StableSR is available. Thank Li Yi for the implementation!

  • 2023.05.13: Add Colab demo of StableSR. <a href="https://colab.research.google.com/drive/11SE2_oDvbYtcuHDbaLAxsKk_o3flsO1T?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>

  • 2023.05.11: Repo is released.

TODO

  • [x] ~~Code release~~
  • [x] ~~Update link to paper and project page~~
  • [x] ~~Pretrained models~~
  • [x] ~~Colab demo~~
  • [x] ~~StableSR-768v released~~
  • [x] ~~Replicate demo~~
  • [x] ~~HuggingFace demo~~
  • [x] ~~StableSR-face released~~
  • [x] ~~ComfyUI support~~

Demo on real-world SR

<img src="assets/imgsli_1.jpg" height="223px"/> <img src="assets/imgsli_2.jpg" height="223px"/> <img src="assets/imgsli_3.jpg" height="223px"/> <img src="assets/imgsli_8.jpg" height="223px"/> <img src="assets/imgsli_4.jpg" height="223px"/> <img src="assets/imgsli_5.jpg" height="223px"/> <img src="assets/imgsli_9.jpg" height="214px"/> <img src="assets/imgsli_6.jpg" height="214px"/> <img src="assets/imgsli_7.jpg" height="214px"/> <img src="assets/imgsli_10.jpg" height="214px"/>

For more evaluation, please refer to our paper for details.

Demo on 4K Results

  • StableSR is capable of achieving arbitrary upscaling in theory, below is an 4x example with a result beyond 4K (4096x6144).

<img src="assets/main-fig.png" width="800px"/>

# DDIM w/ negative prompts
python scripts/sr_val_ddim_text_T_negativeprompt_canvas_tile.py --config configs/stableSRNew/v2-finetune_text_T_768v.yaml --ckpt stablesr_768v_000139.ckpt --vqgan_ckpt vqgan_finetune_00011.ckpt --init-img ./inputs/test_example/ --outdir ../output/ --ddim_steps 20 --dec_w 0.0 --colorfix_type wavelet --scale 7.0 --use_negative_prompt --upscale 4 --seed 42 --n_samples 1 --input_size 768 --tile_overlap 48 --ddim_eta 1.0
  • More examples.

Dependencies and Installation

  • Pytorch == 1.12.1
  • CUDA == 11.7
  • pytorch-lightning==1.4.2
  • xformers == 0.0.16 (Optional)
  • Other required packages in environment.yaml
# git clone this repository
git clone https://github.com/IceClear/StableSR.git
cd StableSR

# Create a conda environment and activate it
conda env create --file environment.yaml
conda activate stablesr

# Install xformers
conda install xformers -c xformers/label/dev

# Install taming & clip
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .

Running Examples

Train

Download the pretrained Stable Diffusion models from [HuggingFace]

  • Train Time-aware encoder with SFT: set the ckpt_path in config files (Line 22 and Line 55)
python main.py --train --base configs/stableSRNew/v2-finetune_text_T_512.yaml --gpus GPU_ID, --name NAME --scale_lr False
  • Train CFW: set the ckpt_path in config files (Line 6).

You need to first generate training data using the finetuned diffusion model in the first stage.

# General SR
python scripts/generate_vqgan_data.py --config configs/stableSRdata/test_data.yaml --ckpt CKPT_PATH --outdir OUTDIR --skip_grid --ddpm_steps 200 --base_i 0 --seed 10000
# For face data
python scripts/generate_vqgan_data_face.py --config configs/stableSRdata/test_data_face.yaml --ckpt CKPT_PATH --outdir OUTDIR --skip_grid --ddpm_steps 200 --base_i 0 --seed 10000

The data folder should be like this:

CFW_trainingdata/
    └── inputs
          └── 00000001.png # LQ images, (512, 512, 3) (resize to 512x512)
          └── ...
    └── gts
          └── 00000001.png # GT images, (512, 512, 3) (512x512)
          └── ...
    └── latents
          └── 00000001.npy # Latent codes (N, 4, 64, 64) of HR images generated by the diffusion U-net, saved in .npy format.
          └── ...
    └── samples
          └── 00000001.png # The HR images generated from latent codes, just to make sure the generated latents are correct.
    

Related Skills

View on GitHub
GitHub Stars2.6k
CategoryDevelopment
Updated2h ago
Forks169

Languages

Python

Security Score

85/100

Audited on Apr 8, 2026

No findings