FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

🔥🔥🔥 FreeScale is a tuning-free method for higher-resolution visual generation, unlocking the 8k image generation!

🚀🚀🚀 The extended work CineScale for higher-resolution visual generation based on Wan 2.1 is available, unlocking the 4k video generation!

Haonan Qiu, Shiwei Zhang*, Yujie Wei, Ruihang Chu, Hangjie Yuan, <br> Xiang Wang, Yingya Zhang, and Ziwei Liu* <br><br> (* Corresponding Author)

From Alibaba Group and Nanyang Technological University.

⚙️ Setup

Install Environment via Anaconda

conda create -n freescale python=3.8
conda activate freescale
pip install -r requirements.txt

🤗 Quick start with Gradio

gradio gradio_app.py

💫 Inference with Command

1. Higher-Resolution Text-to-Image

Modify the run_freescale.py and input the following commands in the terminal.
Input the following commands in terminal:

python run_freescale.py

# resolutions_list: resolutions for each stage of self-cascade upscaling.
# cosine_scale: detail scale, usually 1.0 ~ 2.0. For 8k image generation, cosine_scale <= 1.0 is recommended.

2. Flexible Control for Detail Level

Modify the run_sdxl.py and generate the base image with the original resolutions.
Input the following commands in terminal:

python run_sdxl.py

Put the generated image into folder imgen_intermediates.
(Optional) Generate the mask using other segmentation models (e.g., Segment Anything) and put the mask into folder imgen_intermediates.
Modify the run_freescale_imgen.py and generate the final image with the higher resolutions.
Input the following commands in terminal:

python run_freescale_imgen.py

# resolutions_list: resolutions for each stage of self-cascade upscaling.
# cosine_scale: detail scale for foreground, usually 2.0 ~ 3.0. 
# cosine_scale_bg: detail scale for background, usually 0.5 ~ 1.0.

3. Faster Generation with SDXL-Turbo

Modify the run_freescale_turbo.py and input the following commands in the terminal.
Input the following commands in terminal:

python run_freescale_turbo.py

# num_inference_steps: 2 ~ 8.
# Currently, the resolution that exceeds 2048 x 2048 will introduce quality loss in the Turbo mode.

🧲 Tips

Generating 8k (8192 x 8192) images will cost around 55 GB and 1 hour on NVIDIA A800.
Set fast_mode = True can significantly shorten the time but lead to some loss of quality especially for 8k image generation.
For 8k image generation, cosine_scale <= 1.0 is recommended. Or use the Flexible Control for Detail Level function and set a small cosine_scale_bg (e.g., 0.5) for areas with artifacts.
Potentially, real images or images generated by other models (e.g., FLUX) can be used as the intermediates of Flexible Control for Detail Level. In this way, FreeScale becomes an img-to-img approach. However, since SDXL may not be able to reconstruct the given content well, it is easy to make unexpected changes. Finding the prompt that allows SDXL to reconstruct the given content as much as possible is particularly important for the quality of the generation.

If your have any questions about FreeScale, feel free to contact Haonan Qiu.

📝 Changelog

[2024.12.22]: 🔥🔥 Release FreeScale for SDXL-Turbo, trading slight quality loss for a significant speedup.
[2024.12.13]: 🔥🔥 Release FreeScale (based on SDXL), higher-resolution image generation! <br>

🚀 My Free Series

FreeNoise: Tuning-free method for longer video generation.

FreeTraj: Tuning-free method for trajectory control.

😉 Citation

@article{qiu2024freescale,
  title={FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion},
  author={Qiu, Haonan and Zhang, Shiwei and Wei, Yujie and Chu, Ruihang and Yuan, Hangjie and Wang, Xiang and Zhang, Yingya and Liu, Ziwei},
  journal={arXiv preprint arXiv:2412.09626},
  year={2024}
}

FreeScale

Install / Use

README