SkillAgentSearch skills...

Edm

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)

Install / Use

/learn @NVlabs/Edm
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)<br><sub>Official PyTorch implementation of the NeurIPS 2022 paper</sub>

Teaser image

Elucidating the Design Space of Diffusion-Based Generative Models<br> Tero Karras, Miika Aittala, Timo Aila, Samuli Laine <br>https://arxiv.org/abs/2206.00364<br>

Abstract: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1+ high-end NVIDIA GPU for sampling and 8+ GPUs for training. We have done all testing and development using V100 and A100 GPUs.
  • 64-bit Python 3.8 and PyTorch 1.12.0 (or later). See https://pytorch.org for PyTorch install instructions.
  • Python libraries: See environment.yml for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
    • conda env create -f environment.yml -n edm
    • conda activate edm
  • Docker users:

Getting started

To reproduce the main results from our paper, simply run:

python example.py

This is a minimal standalone script that loads the best pre-trained model for each dataset and generates a random 8x8 grid of images using the optimal sampler settings. Expected results:

| Dataset | Runtime | Reference image | :------- | :------ | :-------------- | CIFAR-10 | ~6 sec | cifar10-32x32.png | FFHQ | ~28 sec | ffhq-64x64.png | AFHQv2 | ~28 sec | afhqv2-64x64.png | ImageNet | ~5 min | imagenet-64x64.png

The easiest way to explore different sampling strategies is to modify example.py directly. You can also incorporate the pre-trained models and/or our proposed EDM sampler in your own code by simply copy-pasting the relevant bits. Note that the class definitions for the pre-trained models are stored within the pickles themselves and loaded automatically during unpickling via torch_utils.persistence. To use the models in external Python scripts, just make sure that torch_utils and dnnlib are accesible through PYTHONPATH.

Docker: You can run the example script using Docker as follows:

# Build the edm:latest image
docker build --tag edm:latest .

# Run the generate.py script using Docker:
docker run --gpus all -it --rm --user $(id -u):$(id -g) \
    -v `pwd`:/scratch --workdir /scratch -e HOME=/scratch \
    edm:latest \
    python example.py

Note: The Docker image requires NVIDIA driver release r520 or later.

The docker run invocation may look daunting, so let's unpack its contents here:

  • --gpus all -it --rm --user $(id -u):$(id -g): with all GPUs enabled, run an interactive session with current user's UID/GID to avoid Docker writing files as root.
  • -v `pwd`:/scratch --workdir /scratch: mount current running dir (e.g., the top of this git repo on your host machine) to /scratch in the container and use that as the current working dir.
  • -e HOME=/scratch: specify where to cache temporary files. Note: if you want more fine-grained control, you can instead set DNNLIB_CACHE_DIR (for pre-trained model download cache). You want these cache dirs to reside on persistent volumes so that their contents are retained across multiple docker run invocations.

Pre-trained models

We provide pre-trained models for our proposed training configuration (config F) as well as the baseline configuration (config A):

To generate a batch of images using a given model and sampler, run:

# Generate 64 images and save them as out/*.png
python generate.py --outdir=out --seeds=0-63 --batch=64 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

Generating a large number of images can be time-consuming; the workload can be distributed across multiple GPUs by launching the above command using torchrun:

# Generate 1024 images using 2 GPUs
torchrun --standalone --nproc_per_node=2 generate.py --outdir=out --seeds=0-999 --batch=64 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

The sampler settings can be controlled through command-line options; see python generate.py --help for more information. For best results, we recommend using the following settings for each dataset:

# For CIFAR-10 at 32x32, use deterministic sampling with 18 steps (NFE = 35)
python generate.py --outdir=out --steps=18 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

# For FFHQ and AFHQv2 at 64x64, use deterministic sampling with 40 steps (NFE = 79)
python generate.py --outdir=out --steps=40 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-ffhq-64x64-uncond-vp.pkl

# For ImageNet at 64x64, use stochastic sampling with 256 steps (NFE = 511)
python generate.py --outdir=out --steps=256 --S_churn=40 --S_min=0.05 --S_max=50 --S_noise=1.003 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-imagenet-64x64-cond-adm.pkl

Besides our proposed EDM sampler, generate.py can also be used to reproduce the sampler ablations from Section 3 of our paper. For example:

# Figure 2a, "Our reimplementation"
python generate.py --outdir=out --steps=512 --solver=euler --disc=vp --schedule=vp --scaling=vp \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl

# Figure 2a, "+ Heun & our {t_i}"
python generate.py --outdir=out --steps=128 --solver=heun --disc=edm --schedule=vp --scaling=vp \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl

# Figure 2a, "+ Our sigma(t) & s(t)"
python generate.py --outdir=out --steps=18 --solver=heun --disc=edm --schedule=linear --scaling=none \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl

Calculating FID

To compute Fréchet inception distance (FID) for a given model and sampler, first generate 50,000 random images and then compare them against the dataset reference statistics using fid.py:

# Generate 50000 images and save them as fid-tmp/*/*.png
torchrun --standalone --nproc_per_node=1 generate.py --outdir=fid-tmp --seeds=0-49999 --subdirs \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

# Calculate FID
torchrun --standalone --nproc_per_node=1 fid.py calc --images=fid-tmp \
    --ref=https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/cifar10-32x32.npz

Both of the above commands can be parallelized across multiple GPUs by adjusting --nproc_per_node. The second command typically takes 1-3 minutes in practice, but the first one can sometimes take several hours, depending on the configuration. See python fid.py --help for the full list of options.

Note that the numerical value of FID varies across different random seeds and is highly sensitive to the number of images. By default, fid.py will always use 50,000 generated images; providing fewer images will result in an error, whereas providing more will use a random subset. To reduce the effect of random variation, we recommend repeating the calculation multiple times with different seeds, e.g., --seeds=0-49999, --seeds=50000-99999, and --seeds=100000-149999. In our paper, we calculated each FID three times and reported the minimum.

Also note that it is important to compare the generated images against the same dataset that the model was originally trained with. To facilitate evaluation, we provide the exact reference statistics that correspond to our pre-trained models:

For ImageNet, we provide two sets of reference statistics to enable apples-to-apples comparison: imagenet-64x64.npz should be used when evaluating the EDM model (edm-imagenet-64x64-cond-adm.pkl), whereas imagenet-64x64-baseline.npz should be used when evaluating the baseline model (baseline-imagenet-64x64-cond-adm.pkl); the latter was originally trained by Dhariwal and Nichol using slightly different training data.

You can compute

Related Skills

View on GitHub
GitHub Stars1.9k
CategoryDesign
Updated22h ago
Forks193

Languages

Python

Security Score

80/100

Audited on Mar 31, 2026

No findings