HiDiffusion

[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!

Generate Convert Improve

Install / Use

/learn @megvii-research/HiDiffusion

About this skill

Quality Score

0/100

README

<div align="center"> <img src="assets/hidiffusion_logo.jpg" height=120> </div>

<div align="center">💡 HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models</div>

<div align="center"><a href="https://scholar.google.com/citations?hl=zh-CN&user=QFowS4cAAAAJ">Shen Zhang</a>, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, <a href="https://jiajunvision.github.io/">Jiajun Liang</a></div> <br> <div align="center"> <a href="https://hidiffusion.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a> &ensp; <a href="https://link.springer.com/chapter/10.1007/978-3-031-72983-6_9"><img src="https://img.shields.io/static/v1?label=Paper&message=ECCV&color=yellow"></a> &ensp; <a href="https://arxiv.org/abs/2311.17528"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv&color=red&logo=arxiv"></a> &ensp; <a href="https://colab.research.google.com/drive/1EiBn9lSnPZTU4cikRRaBBexs429M-qty?usp=sharing"><img src="https://img.shields.io/static/v1?label=Demo&message=Colab&color=purple&logo=googlecolab"></a> &ensp; <a href="https://openbayes.com/console/public/tutorials/SaPYcYCaWSA"><img src="https://img.shields.io/static/v1?label=Demo&message=OpenBayes&color=green"></a> &ensp; </div> <div align="center"> <img src="assets/image_gallery.jpg" width="800" ></img> <br> <em> (Select HiDiffusion samples for various diffusion models, resolutions, and aspect ratios.) </em> </div> <br>

👉 Why HiDiffusion

A training-free method that increases the resolution and speed of pretrained diffusion models.
Designed as a plug-and-play implementation. It can be integrated into diffusion pipelines by only adding a single line of code!
Supports various tasks, including text-to-image, image-to-image, inpainting.

<div align="center"> <img src="assets/quality_efficiency.jpg" width="800" ></img> <br> <em> (Faster, and better image details.) </em> </div> <br> <div align="center"> <img src="assets/various_task.jpg" width="800" ></img> <br> <em> (2K results of ControlNet and inpainting tasks.) </em> </div> <br>

🔥 Update

2024.8.15 - 💥 Diffusers documentation has added HiDiffusion, see here. Thank Diffusers team!
2024.7.3 - 💥 Accepted by ECCV 2024!
2024.6.19 - 💥 Integrated into OpenBayes, see the demo. Thank OpenBayes team!
2024.6.16 - 💥 Support PyTorch 2.X.
2024.6.16 - 💥 Fix non-square generation issue. Now HiDiffusion supports more image sizes and aspect ratios.
2024.5.7 - 💥 Support image-to-image task, see here.
2024.4.16 - 💥 Release source code.

📢 Supported Models

Note: HiDiffusion also supports the downstream diffusion models based on these repositories, such as Ghibli-Diffusion, Playground, etc.

💣 Supported Tasks

✅ Text-to-image
✅ ControlNet, including text-to-image, image-to-image
✅ Inpainting

🔎 Main Requirements

This repository is tested on

Python==3.8
torch>=1.13.1
diffusers>=0.25.0
transformers
accelerate
xformers

🔑 Install HiDiffusion

After installing the packages in the main requirements, install HiDiffusion:

pip3 install hidiffusion

Installing from source

Alternatively, you can install from github source. Clone the repository and install:

git clone https://github.com/megvii-model/HiDiffusion.git
cd HiDiffusion
python3 setup.py install

🚀 Usage

Generating outputs with HiDiffusion is super easy based on 🤗 diffusers. You just need to add a single line of code.

Text-to-image generation

Stable Diffusion XL

from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import StableDiffusionXLPipeline, DDIMScheduler
import torch
pretrain_model = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(pretrain_model, subfolder="scheduler")
pipe = StableDiffusionXLPipeline.from_pretrained(pretrain_model, scheduler = scheduler, torch_dtype=torch.float16, variant="fp16").to("cuda")

# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()

# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)

prompt = "Standing tall amidst the ruins, a stone golem awakens, vines and flowers sprouting from the crevices in its body."
negative_prompt = "blurry, ugly, duplicate, poorly drawn face, deformed, mosaic, artifacts, bad limbs"
image = pipe(prompt, guidance_scale=7.5, height=2048, width=2048, eta=1.0, negative_prompt=negative_prompt).images[0]
image.save(f"golem.jpg")

<details> <summary>Output:</summary> <div align="center"> <img src="assets/sdxl.jpg" width="800" ></img> </div> </details>

Set height = 4096, width = 4096, and you can get output with 4096x4096 resolution.

Stable Diffusion XL Turbo

from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import AutoPipelineForText2Image
import torch
pretrain_model = "stabilityai/sdxl-turbo"
pipe = AutoPipelineForText2Image.from_pretrained(pretrain_model, torch_dtype=torch.float16, variant="fp16").to('cuda')

# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()

# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)

prompt = "In the depths of a mystical forest, a robotic owl with night vision lenses for eyes watches over the nocturnal creatures."
image = pipe(prompt, num_inference_steps=4, height=1024, width=1024, guidance_scale=0.0).images[0]
image.save(f"./owl.jpg")

<details> <summary>Output:</summary> <div align="center"> <img src="assets/sdxl_turbo.jpg" width="800" ></img> </div> </details>

Stable Diffusion v2-1

from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import DiffusionPipeline, DDIMScheduler
import torch
pretrain_model = "stabilityai/stable-diffusion-2-1-base"
scheduler = DDIMScheduler.from_pretrained(pretrain_model, subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained(pretrain_model, scheduler = scheduler, torch_dtype=torch.float16).to("cuda")

# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()

# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)

prompt = "An adorable happy brown border collie sitting on a bed, high detail."
negative_prompt = "ugly, tiling, out of frame, poorly drawn face, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, artifacts, bad proportions."
image = pipe(prompt, guidance_scale=7.5, height=1024, width=1024, eta=1.0, negative_prompt=negative_prompt).images[0]
image.save(f"collie.jpg")

<details> <summary>Output:</summary> <div align="center"> <img src="assets/sd21.jpg" width="800" ></img> </div> </details>

Set height = 2048, width = 2048, and you can get output with 2048x2048 resolution.

Stable Diffusion v1-5

from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import DiffusionPipeline, DDIMScheduler
import torch
pretrain_model = "runwayml/stable-diffusion-v1-5"
scheduler = DDIMScheduler.from_pretrained(pretrain_model, subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained(pretrain_model, scheduler = scheduler, torch_dtype=torch.float16).to("cuda")

# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()

# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)

prompt = "thick strokes, bright colors, an exotic fox, cute, chibi kawaii. detailed fur, hyperdetailed , big reflective eyes, fairytale, artstation,centered composition, perfect composition, centered, vibrant colors, muted colors, high detailed, 8k."
negative_prompt = "ugly, tiling, poorly drawn face, out of frame, disfigured, deformed, blurry, bad anatomy, blurred."
image = pipe(prompt, guidance_scale=7.5, height=1024, width=1024, eta=1.0, negative_prompt=negative_prompt).images[0]
image.save(f"fox.jpg")

<details> <summary>Output:</summary> <div align="center"> <img src="assets/sd15.jpg" width="800" ></img> </div> </details>

Set height = 2048, width = 2048, and you can get output with 2048x2048 resolution.

Remove HiDiffusion

If you want to remove HiDiiffusion, simply use remove_hidiffusion(pipe).

ControlNet

Text-to-image generation

from diffuser

Related Skills

node-connect

339.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

339.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.9k

Commit, push, and open a PR