DEADiff
[CVPR 2024] Official implementation of "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations"
Install / Use
/learn @bytedance/DEADiffREADME
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations (CVPR 2024)
<div align="center"><a href='https://arxiv.org/abs/2403.06951'><img src='https://img.shields.io/badge/arXiv-2403.06951-b31b1b.svg'></a> <a href='https://tianhao-qi.github.io/DEADiff/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
Tianhao Qi*, Shancheng Fang, Yanze Wu✝, Hongtao Xie✉, Jiawei Liu, <br>Lang Chen, Qian He, Yongdong Zhang <br><br> (*Works done during the internship at ByteDance, ✝Project Lead, ✉Corresponding author)
From University of Science and Technology of China and ByteDance.
</div>🔆 Introduction
TL;DR: We propose DEADiff, a generic method facilitating the synthesis of novel images that embody the style of a given reference image and adhere to text prompts. <br>
⭐⭐ Stylized Text-to-Image Generation.
<div align="center"> <img src=docs/showcase_img.png> <p>Stylized text-to-image results. Resolution: 512 x 512. (Compressed)</p> </div>⭐⭐ Style Transfer.
<div align="center"> <img src=docs/showcase_controlnet.png> <p>Style transfer results with <a href="https://github.com/lllyasviel/ControlNet.git" target="_blank">ControlNet</a>. </p> </div>📝 Changelog
- [2024.4.3]: 🔥🔥 Release the inference code and pretrained checkpoint.
- [2024.3.5]: 🔥🔥 Release the project page.
⏳ TODO
- [x] Release the inference code.
- [ ] Release training data.
⚙️ Setup
conda create -n deadiff python=3.9.2
conda activate deadiff
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install git+https://github.com/salesforce/LAVIS.git@20230801-blip-diffusion-edit
pip install -r requirements.txt
pip install -e .
💫 Inference
- Download the pretrained model from Hugging Face and put it under ./pretrained/.
- Run the commands in terminal.
python3 scripts/app.py
The Gradio app allows you to transfer style from the reference image. Just try it for more details.
Prompt: "A curly-haired boy"
Prompt: "A robot"
Prompt: "A motorcycle"
➕ Style Transfer with ControlNet
We support style transfer with structural control by combining DEADiff with ControlNet. This enables users to guide the spatial layout (e.g., edges or depth maps) of the generated images, while transferring the visual style from a reference image.
To perform style transfer with ControlNet, please download the following pretrained models:
control_sd15_canny.pth: Download → place it under./pretrained/control_sd15_depth.pth: Download → place it under./pretrained/dpt_hybrid-midas-501f0c75.pt(for depth estimation): Download → place it underldm/controlnet/annotator/ckpts/These checkpoints are required for Canny and Depth-based ControlNet stylization modes. Then run the following commands in terminal.
# Canny-based control
python3 scripts/app_canny_control.py
# Depth-based control
python3 scripts/app_depth_control.py
📢 Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.
✈️ Citation
@article{qi2024deadiff,
title={DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations},
author={Qi, Tianhao and Fang, Shancheng and Wu, Yanze and Xie, Hongtao and Liu, Jiawei and Chen, Lang and He, Qian and Zhang, Yongdong},
journal={arXiv preprint arXiv:2403.06951},
year={2024}
}
📭 Contact
If your have any comments or questions, feel free to contact qth@mail.ustc.edu.cn
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
