HiDiffusion
[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!
Install / Use
/learn @megvii-research/HiDiffusionREADME
<div align="center">💡 HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models</div>
<div align="center"><a href="https://scholar.google.com/citations?hl=zh-CN&user=QFowS4cAAAAJ">Shen Zhang</a>, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, <a href="https://jiajunvision.github.io/">Jiajun Liang</a></div> <br> <div align="center"> <a href="https://hidiffusion.github.io/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>   <a href="https://link.springer.com/chapter/10.1007/978-3-031-72983-6_9"><img src="https://img.shields.io/static/v1?label=Paper&message=ECCV&color=yellow"></a>   <a href="https://arxiv.org/abs/2311.17528"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv&color=red&logo=arxiv"></a>   <a href="https://colab.research.google.com/drive/1EiBn9lSnPZTU4cikRRaBBexs429M-qty?usp=sharing"><img src="https://img.shields.io/static/v1?label=Demo&message=Colab&color=purple&logo=googlecolab"></a>   <a href="https://openbayes.com/console/public/tutorials/SaPYcYCaWSA"><img src="https://img.shields.io/static/v1?label=Demo&message=OpenBayes&color=green"></a>   </div> <div align="center"> <img src="assets/image_gallery.jpg" width="800" ></img> <br> <em> (Select HiDiffusion samples for various diffusion models, resolutions, and aspect ratios.) </em> </div> <br>👉 Why HiDiffusion
- A training-free method that increases the resolution and speed of pretrained diffusion models.
- Designed as a plug-and-play implementation. It can be integrated into diffusion pipelines by only adding a single line of code!
- Supports various tasks, including text-to-image, image-to-image, inpainting.
🔥 Update
-
2024.8.15 - 💥 Diffusers documentation has added HiDiffusion, see here. Thank Diffusers team!
-
2024.7.3 - 💥 Accepted by ECCV 2024!
-
2024.6.19 - 💥 Integrated into OpenBayes, see the demo. Thank OpenBayes team!
-
2024.6.16 - 💥 Support PyTorch 2.X.
-
2024.6.16 - 💥 Fix non-square generation issue. Now HiDiffusion supports more image sizes and aspect ratios.
-
2024.5.7 - 💥 Support image-to-image task, see here.
-
2024.4.16 - 💥 Release source code.
📢 Supported Models
Note: HiDiffusion also supports the downstream diffusion models based on these repositories, such as Ghibli-Diffusion, Playground, etc.
💣 Supported Tasks
- ✅ Text-to-image
- ✅ ControlNet, including text-to-image, image-to-image
- ✅ Inpainting
🔎 Main Requirements
This repository is tested on
- Python==3.8
- torch>=1.13.1
- diffusers>=0.25.0
- transformers
- accelerate
- xformers
🔑 Install HiDiffusion
After installing the packages in the main requirements, install HiDiffusion:
pip3 install hidiffusion
Installing from source
Alternatively, you can install from github source. Clone the repository and install:
git clone https://github.com/megvii-model/HiDiffusion.git
cd HiDiffusion
python3 setup.py install
🚀 Usage
Generating outputs with HiDiffusion is super easy based on 🤗 diffusers. You just need to add a single line of code.
Text-to-image generation
Stable Diffusion XL
from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import StableDiffusionXLPipeline, DDIMScheduler
import torch
pretrain_model = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(pretrain_model, subfolder="scheduler")
pipe = StableDiffusionXLPipeline.from_pretrained(pretrain_model, scheduler = scheduler, torch_dtype=torch.float16, variant="fp16").to("cuda")
# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()
# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)
prompt = "Standing tall amidst the ruins, a stone golem awakens, vines and flowers sprouting from the crevices in its body."
negative_prompt = "blurry, ugly, duplicate, poorly drawn face, deformed, mosaic, artifacts, bad limbs"
image = pipe(prompt, guidance_scale=7.5, height=2048, width=2048, eta=1.0, negative_prompt=negative_prompt).images[0]
image.save(f"golem.jpg")
<details>
<summary>Output:</summary>
<div align="center">
<img src="assets/sdxl.jpg" width="800" ></img>
</div>
</details>
Set height = 4096, width = 4096, and you can get output with 4096x4096 resolution.
Stable Diffusion XL Turbo
from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import AutoPipelineForText2Image
import torch
pretrain_model = "stabilityai/sdxl-turbo"
pipe = AutoPipelineForText2Image.from_pretrained(pretrain_model, torch_dtype=torch.float16, variant="fp16").to('cuda')
# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()
# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)
prompt = "In the depths of a mystical forest, a robotic owl with night vision lenses for eyes watches over the nocturnal creatures."
image = pipe(prompt, num_inference_steps=4, height=1024, width=1024, guidance_scale=0.0).images[0]
image.save(f"./owl.jpg")
<details>
<summary>Output:</summary>
<div align="center">
<img src="assets/sdxl_turbo.jpg" width="800" ></img>
</div>
</details>
Stable Diffusion v2-1
from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import DiffusionPipeline, DDIMScheduler
import torch
pretrain_model = "stabilityai/stable-diffusion-2-1-base"
scheduler = DDIMScheduler.from_pretrained(pretrain_model, subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained(pretrain_model, scheduler = scheduler, torch_dtype=torch.float16).to("cuda")
# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()
# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)
prompt = "An adorable happy brown border collie sitting on a bed, high detail."
negative_prompt = "ugly, tiling, out of frame, poorly drawn face, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, artifacts, bad proportions."
image = pipe(prompt, guidance_scale=7.5, height=1024, width=1024, eta=1.0, negative_prompt=negative_prompt).images[0]
image.save(f"collie.jpg")
<details>
<summary>Output:</summary>
<div align="center">
<img src="assets/sd21.jpg" width="800" ></img>
</div>
</details>
Set height = 2048, width = 2048, and you can get output with 2048x2048 resolution.
Stable Diffusion v1-5
from hidiffusion import apply_hidiffusion, remove_hidiffusion
from diffusers import DiffusionPipeline, DDIMScheduler
import torch
pretrain_model = "runwayml/stable-diffusion-v1-5"
scheduler = DDIMScheduler.from_pretrained(pretrain_model, subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained(pretrain_model, scheduler = scheduler, torch_dtype=torch.float16).to("cuda")
# # Optional. enable_xformers_memory_efficient_attention can save memory usage and increase inference speed. enable_model_cpu_offload and enable_vae_tiling can save memory usage.
# pipe.enable_xformers_memory_efficient_attention()
# pipe.enable_model_cpu_offload()
# pipe.enable_vae_tiling()
# Apply hidiffusion with a single line of code.
apply_hidiffusion(pipe)
prompt = "thick strokes, bright colors, an exotic fox, cute, chibi kawaii. detailed fur, hyperdetailed , big reflective eyes, fairytale, artstation,centered composition, perfect composition, centered, vibrant colors, muted colors, high detailed, 8k."
negative_prompt = "ugly, tiling, poorly drawn face, out of frame, disfigured, deformed, blurry, bad anatomy, blurred."
image = pipe(prompt, guidance_scale=7.5, height=1024, width=1024, eta=1.0, negative_prompt=negative_prompt).images[0]
image.save(f"fox.jpg")
<details>
<summary>Output:</summary>
<div align="center">
<img src="assets/sd15.jpg" width="800" ></img>
</div>
</details>
Set height = 2048, width = 2048, and you can get output with 2048x2048 resolution.
Remove HiDiffusion
If you want to remove HiDiiffusion, simply use remove_hidiffusion(pipe).
ControlNet
Text-to-image generation
from diffuser
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
