Pfgmpp

Code for ICML 2023 paper, "PFGM++: Unlocking the Potential of Physics-Inspired Generative Models"

Generate Convert Improve

Install / Use

/learn @Newbeeer/Pfgmpp

About this skill

Quality Score

0/100

README

PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

Pytorch implementation of the paper PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

by Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, Tommi S. Jaakkola

[Slide]

😇 Improvements over PFGM / Diffusion Models:

No longer require the large batch training target in PFGM, thus enable flexible conditional generation and more efficient training!
More general $D \in \mathbb{R}^+$ dimensional augmented variable. PFGM++ subsumes PFGM and Diffusion Models: PFGM correspond to $D=1$ and Diffusion Models correspond to $D\to \infty$.
Existence of sweet spot $D^*$ in the middle of $(1,\infty)$!
Smaller $D$ more robust than Diffusion Models ( $D\to \infty$ )
Enable the adjustment for model robustness and rigidity!
Enable direct transfer of well-tuned hyperparameters from any existing Diffusion Models ( $D\to \infty$ )

Abstract: We present a general framework termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for $N$ dimensional data by embedding paths in $N{+}D$ dimensional space while still controlling the progression with a simple scalar norm of the $D$ additional variables. The new models reduce to PFGM when $D{=}1$ and to diffusion models when $D{\to}\infty$. The flexibility of choosing $D$ allows us to trade off robustness against rigidity as increasing $D$ results in more concentrated coupling between the data and the additional variable norms. We dispense with the biased large batch field targets used in PFGM and instead provide an unbiased perturbation-based objective similar to diffusion models. To explore different choices of $D$, we provide a direct alignment method for transferring well-tuned hyperparameters from diffusion models ( $D{\to} \infty$ ) to any finite $D$ values. Our experiments show that models with finite $D$ can be superior to previous state-of-the-art diffusion models on CIFAR-10/FFHQ $64{\times}64$ datasets, with FID scores of $1.91/2.43$ when $D{=}2048/128$. In class-conditional generation, $D{=}2048$ yields current state-of-the-art FID of $1.74$ on CIFAR-10. In addition, we demonstrate that models with smaller $D$ exhibit improved robustness against modeling errors.

schematic

Outline

Our implementation is built upon the EDM repo. We first provide an guidance on how to quickly transfer the hyperparameter from well-tuned diffusion models ( $D\to \infty$ ), such as EDM and DDPM, to the PFGM++ family ( $D\in \mathbb{R}^+$ ) in a task/dataset agnostic way (We provide more details in Sec 4 ( Transfer hyperparameters to finite $D$ ) and Appendix C.2 in our paper). We highlight our modifications based on their original command lines for training, sampling and evaluation. We provide checkpoints in checkpoints section.

We also provide the original instruction for set-ups, such as environmental requirements and dataset preparation, from EDM repo.

Transfer guidance by $r=\sigma\sqrt{D}$ formula

Below we provide the guidance for how to quick transfer the well-tuned hyperparameters for diffusion models ( $D\to \infty$ ), such as $\sigma_{\textrm{max}}$ and $p(\sigma)$ to finite $D$s. We adopt the $r=\sigma\sqrt{D}$ formula in our paper for the alignment (c.f. Section 4). Please use the following guidance as a prototype.

😀 Please adjust the augmented dimension $D$ according to your task/dataset/model.

Training hyperparameter transfer. The example we provide is a simplified version of loss.py in this repo.

schematic

def train(y, N, D, pfgmpp):
  '''
  y: mini-batch clean images
  N: data dimension
  D: augmented dimension
  pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1
  '''
  
  if not pfgmpp:
    ###################### === Diffusion Model === ######################
    rnd_normal = torch.randn([images.shape[0], 1, 1, 1], device=images.device)
    sigma = (rnd_normal * self.P_std + self.P_mean).exp() # sample sigma from p(\sigma)
    n = torch.randn_like(y) * sigma
    D_yn = net(y + n, sigma)
    loss = (D_yn - y) ** 2
    ###################### === Diffusion Model === ######################
  else: 
    ###################### === PFGM++ === ######################
    rnd_normal = torch.randn(images.shape[0], device=images.device)
    sigma = (rnd_normal * self.P_std + self.P_mean).exp() # sample sigma from p(\sigma)
    r = sigma.double() * np.sqrt(self.D).astype(np.float64) # r=sigma\sqrt{D} formula

    # = sample noise from perturbation kernel p_r = #
    # Sampling form inverse-beta distribution
    samples_norm = np.random.beta(a=self.N / 2., b=self.D / 2.,
                                 size=images.shape[0]).astype(np.double)
    inverse_beta = samples_norm / (1 - samples_norm +1e-8)
    inverse_beta = torch.from_numpy(inverse_beta).to(images.device).double()
    # Sampling from p_r(R) by change-of-variable (c.f. Appendix B)
    samples_norm = (r * torch.sqrt(inverse_beta +1e-8)).view(len(samples_norm), -1)
    # Uniformly sample the angle component
    gaussian = torch.randn(images.shape[0], self.N).to(samples_norm.device)
    unit_gaussian = gaussian / torch.norm(gaussian, p=2, dim=1, keepdim=True)
    # Construct the perturbation 
    perturbation_x = (unit_gaussian * samples_norm).float()
    # = sample noise from perturbation kernel p_r = #

    sigma = sigma.reshape((len(sigma), 1, 1, 1))
    n = perturbation_x.view_as(y)
    D_yn = net(y + n, sigma)
    loss = (D_yn - y) ** 2
    ###################### === PFGM++ === ######################

Sampling hyperparameter transfer. The example we provide is a simplified version of generate.py in this repo. As shown in the figure below, the only modification is the prior sampling process. Hence we only include the comparison of prior sampling for diffusion models / PFGM++ in the code snippet.

schematic

def generate(sigma_max, N, D, pfgmpp)
  '''
  sigma_max: starting condition for diffusion models
  N: data dimension
  D: augmented dimension
  pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1
  '''
  if not pfgmpp:
    ###################### === Diffusion Model === ######################
    x = torch.randn_like(data_size) * sigma_max
    ###################### === Diffusion Model === ######################
  else:
    ###################### === PFGM++ === ######################
    # Sampling form inverse-beta distribution
    r = sigma_max * np.sqrt(self.D) # r=sigma\sqrt{D} formula
    samples_norm = np.random.beta(a=self.N / 2., b=self.D / 2.,
                                  size=data_size).astype(np.double)
    inverse_beta = samples_norm / (1 - samples_norm +1e-8)
    inverse_beta = torch.from_numpy(inverse_beta).to(images.device).double()
    # Sampling from p_r(R) by change-of-variable (c.f. Appendix B)
    samples_norm = (r * torch.sqrt(inverse_beta +1e-8)).view(len(samples_norm), -1)
    # Uniformly sample the angle component
    gaussian = torch.randn(images.shape[0], self.N).to(samples_norm.device)
    unit_gaussian = gaussian / torch.norm(gaussian, p=2, dim=1, keepdim=True)
    # Construct the perturbation 
    x = (unit_gaussian * samples_norm).float().view(data_size)
    ###################### === PFGM++ === #######################
    
    
  ########################################################
    
  # Heun's 2nd order method (aka improved Euler method)  #
    
  ########################################################

Please refer to Appendix C.2 for detailed hyperparameter transfer procedures from EDM and DDPM.

Training PFGM++

You can train new models using train.py. For example:

torchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --name exp_name \
--data=datasets/cifar10-32x32.zip --cond=0 --arch=arch \
--pfgmpp=1 --batch 512 \
--aug_dim aug_dim (--resume resume_path)

exp_name: name of experiments
aug_dim: D (additional dimensions)  
arch: model architectures. options: ncsnpp | ddpmpp
pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1
resume_path: path to the resuming checkpoint

The above example uses the default batch size of 512 images (controlled by --batch) that is divided evenly among 8 GPUs (controlled by --nproc_per_node) to yield 64 images per GPU. Training large models may run out of GPU memory; the best way to avoid this is to limit the per-GPU batch size, e.g

Related Skills

node-connect

347.6k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.6k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.6k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。