BELM
[NeurIPS 2024] Official implementation of "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models".
Install / Use
/learn @zituitui/BELMREADME
BELM: High-quality Exact Inversion sampler of Diffusion Models 🏆
<div align="center">This repository is the official implementation of the NeurIPS 2024 paper: "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models"
Keywords: Diffusion Model, Exact Inversion, ODE Solver
</div> <p align="center"> <img src="assets/recon.jpg" alt="Recon Results" style="width:50%;"/> </p> <!-- > Image Editing Results --> <!-- > Images editing: -->Fangyikang Wang<sup>1</sup>, Hubery Yin<sup>2</sup>, Yuejiang Dong<sup>3</sup>, Huminhao Zhu<sup>1</sup>, <br> Chao Zhang<sup>1</sup>, Hanbin Zhao<sup>1</sup>, Hui Qian<sup>1</sup>, Chen Li<sup>2</sup>
<sup>1</sup>Zhejiang University <sup>2</sup>WeChat, Tencent Inc. <sup>3</sup>Tsinghua University


🆕 What's New?
🔥 We use the thought of bidirectional explicit to enable exact inversion

Schematic description of DDIM (left) and BELM (right). DDIM uses $
\mathbf{x}_i$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)$ to calculate $\mathbf{x}_{i-1}$ based on a linear relation between $\mathbf{x}_i$, $\mathbf{x}_{i-1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)$ (represented by the <span style="color:blue">blue line</span>). However, DDIM inversion uses $\mathbf{x}_{i-1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)$ to calculate $\mathbf{x}_{i}$ based on a different linear relation represented by the <span style="color:red">red line</span>. This mismatch leads to the inexact inversion of DDIM. In contrast, BELM seeks to establish a linear relation between $\mathbf{x}_{i-1}$, $\mathbf{x}_i$, $\mathbf{x}_{i+1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i}, i)$ (represented by the <span style="color:green">green line</span>). BELM and its inversion are derived from this unitary relation, which facilitates the exact inversion. Specifically, BELM uses the linear combination of $\mathbf{x}_i$, $\mathbf{x}_{i+1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)$ to calculate $\mathbf{x}_{i-1}$, and the BELM inversion uses the linear combination of $\mathbf{x}_{i-1}$, $\mathbf{x}_i$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)$ to calculate $\mathbf{x}_{i+1}$. The bidirectional explicit constraint means this linear relation does not include the derivatives at the bidirectional endpoint, that is, $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i+1},i+1)$.
🔥 We introduce a generic formulation of the exact inversion samplers, BELM.
<!--   -->the general k-step BELM:
\bar{\mathbf{x}}_{i-1} = \sum_{j=1}^{k} a_{i,j}\cdot \bar{\mathbf{x}}_{i-1+j} +\sum_{j=1}^{k-1}b_{i,j}\cdot h_{i-1+j}\cdot\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}_{i-1+j},\bar{\sigma}_{i-1+j}).
2-step BELM:
\bar{\mathbf{x}}_{i-1} = a_{i,2}\bar{\mathbf{x}}_{i+1} +a_{i,1}\bar{\mathbf{x}}_{i} + b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}_i,\bar{\sigma}_i).
🔥 We derive the optimal coefficients for BELM via LTE minimization.
<!--  --> <div style="background-color: #f0f0f0; padding: 10px; border-radius: 5px;"></div>Proposition The LTE $
\tau_i$ of BELM diffusion sampler, which is given by $\tau_i = \bar{\mathbf{x}}(t_{i-1}) - a_{i,2}\bar{\mathbf{x}}(t_{i+1}) -a_{i,1}\bar{\mathbf{x}}(t_{i}) - b_{i,1} h_i\bar{\boldsymbol{\varepsilon}}_\theta(\bar{\mathbf{x}}(t_i),\bar{\sigma}_i)$, can be accurate up to $\mathcal{O}\left({(h_{i}+h_{i+1})}^3\right)$ when formulae are designed as $a_{i,1} = \frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}$,$a_{i,2}=\frac{h_i^2}{h_{i+1}^2}$,$b_{i,1}=- \frac{h_i+h_{i+1}}{h_{i+1}}$.
where $h_i = \frac{\sigma_i}{\alpha_i}-\frac{\sigma_{i-1}}{\alpha{i-1}}$
the Optimal-BELM (O-BELM) sampler:
\mathbf{x}_{i-1} = \frac{h_i^2}{h_{i+1}^2}\frac{\alpha_{i-1}}{\alpha_{i+1}}\mathbf{x}_{i+1} +\frac{h_{i+1}^2 - h_i^2}{h_{i+1}^2}\frac{\alpha_{i-1}}{\alpha_{i}}\mathbf{x}_{i} - \frac{h_i(h_i+h_{i+1})}{h_{i+1}}\alpha_{i-1}\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i).
The inversion of O-BELM diffusion sampler writes:
\mathbf{x}_{i+1}= \frac{h_{i+1}^2}{h_i^2}\frac{\alpha_{i+1}}{\alpha_{i-1}}\mathbf{x}_{i-1} + \frac{h_i^2-h_{i+1}^2}{h_i^2}\frac{\alpha_{i+1}}{\alpha_{i}}\mathbf{x}_{i}+\frac{h_{i+1}(h_i+h_{i+1})}{h_i}\alpha_{i+1} \boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i).
👨🏻💻 Run the code
1) Get start
- Python 3.8.12
- CUDA 11.7
- NVIDIA A100 40GB PCIe
- Torch 2.0.0
- Torchvision 0.14.0
Please follow diffusers to install diffusers.
2) Run
first, please switch to the root directory.
CIFAR10 sampling
python3 ./scripts/cifar10.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
CelebA-HQ sampling
python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
FID evaluation
python3 ./scripts/celeba.py --test_num 10 --batch_size 32 --num_inference_steps 100 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxx/ddpm_ema_cifar10
intrpolation
python3 ./scripts/interpolate.py --test_num 10 --batch_size 1 --num_inference_steps 100 --save_dir YOUR/SAVE/DIR --model_id xx
Reconstruction error calculation
python3 ./scripts/reconstruction.py --test_num 10 --num_inference_steps 100 --directory WHERE/YOUR/IMAGES/ARE --sampler_type belm
Image editing
python3 ./scripts/image_editing.py --num_inference_steps 200 --freeze_step 50 --guidance 2.0 --sampler_type belm --save_dir YOUR/SAVE/DIR --model_id xxxxx/stable-diffusion-v1-5 --ori_im_path images/imagenet_dog_1.jpg --ori_prompt 'A dog' --res_prompt 'A Dalmatian'
🪪 License
This project is licensed under the MIT License - see the LICENSE file for details.
📝 Citation
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@inproceedings{
wang2024belm,
title={{BELM}: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models},
author={Fangyikang Wang and Hubery Yin and Yue-Jiang Dong and Huminhao Zhu and Chao Zhang and Hanbin Zhao and Hui Qian and Chen Li},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=ccQ4fmwLDb}
}
📩 Contact me
My e-mail address:
wangfangyikang@zju.edu.cn
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
