The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 Spotlight Paper)

<div align="center"> <a href="http://latentspace.cc/arash_vahdat/" target="_blank">Arash Vahdat</a> &emsp; <b>·</b> &emsp; <a href="http://jankautz.com/" target="_blank">Jan Kautz</a> </div> <br> <br>

NVAE is a deep hierarchical variational autoencoder that enables training SOTA likelihood-based generative models on several image datasets.

Requirements

NVAE is built in Python 3.7 using PyTorch 1.6.0. Use the following command to install the requirements:

pip install -r requirements.txt

Set up file paths and data

We have examined NVAE on several datasets. For large datasets, we store the data in LMDB datasets for I/O efficiency. Click below on each dataset to see how you can prepare your data. Below, $DATA_DIR indicates the path to a data directory that will contain all the datasets and $CODE_DIR refers to the code directory:

<details><summary>MNIST and CIFAR-10</summary>

These datasets will be downloaded automatically, when you run the main training for NVAE using train.py for the first time. You can use --data=$DATA_DIR/mnist or --data=$DATA_DIR/cifar10, so that the datasets are downloaded to the corresponding directories.

</details> <details><summary>CelebA 64</summary> Run the following commands to download the CelebA images and store them in an LMDB dataset:

cd $CODE_DIR/scripts
python create_celeba64_lmdb.py --split train --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb
python create_celeba64_lmdb.py --split valid --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb
python create_celeba64_lmdb.py --split test  --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb

Above, the images will be downloaded to $DATA_DIR/celeba_org automatically and then then LMDB datasets are created at $DATA_DIR/celeba64_lmdb.

</details> <details><summary>ImageNet 32x32</summary>

Run the following commands to download tfrecord files from GLOW and to convert them to LMDB datasets

mkdir -p $DATA_DIR/imagenet-oord
cd $DATA_DIR/imagenet-oord
wget https://storage.googleapis.com/glow-demo/data/imagenet-oord-tfr.tar
tar -xvf imagenet-oord-tfr.tar
cd $CODE_DIR/scripts
python convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR/imagenet-oord/mnt/host/imagenet-oord-tfr --lmdb_path=$DATA_DIR/imagenet-oord/imagenet-oord-lmdb_32 --split=train
python convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR/imagenet-oord/mnt/host/imagenet-oord-tfr --lmdb_path=$DATA_DIR/imagenet-oord/imagenet-oord-lmdb_32 --split=validation

</details> <details><summary>CelebA HQ 256</summary>

Run the following commands to download tfrecord files from GLOW and to convert them to LMDB datasets

mkdir -p $DATA_DIR/celeba
cd $DATA_DIR/celeba
wget https://storage.googleapis.com/glow-demo/data/celeba-tfr.tar
tar -xvf celeba-tfr.tar
cd $CODE_DIR/scripts
python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=train
python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=validation

</details> <details><summary>FFHQ 256</summary>

Visit this Google drive location and download images1024x1024.zip. Run the following commands to unzip the images and to store them in LMDB datasets:

mkdir -p $DATA_DIR/ffhq
unzip images1024x1024.zip -d $DATA_DIR/ffhq/
cd $CODE_DIR/scripts
python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/images1024x1024/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train
python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/images1024x1024/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=validation

</details> <details><summary>LSUN</summary>

We use LSUN datasets in our follow-up works. Visit LSUN for instructions on how to download this dataset. Since the LSUN scene datasets come in the LMDB format, they are ready to be loaded using torchvision data loaders.

</details>

Running the main NVAE training and evaluation scripts

We use the following commands on each dataset for training NVAEs on each dataset for Table 1 in the paper. In all the datasets but MNIST normalizing flows are enabled. Check Table 6 in the paper for more information on training details. Note that for the multinode training (more than 8-GPU experiments), we use the mpirun command to run the training scripts on multiple nodes. Please adjust the commands below according to your setup. Below IP_ADDR is the IP address of the machine that will host the process with rank 0 (see here). NODE_RANK is the index of each node among all the nodes that are running the job.

<details><summary>MNIST</summary>

Two 16-GB V100 GPUs are used for training NVAE on dynamically binarized MNIST. Training takes about 21 hours.

export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
cd $CODE_DIR
python train.py --data $DATA_DIR/mnist --root $CHECKPOINT_DIR --save $EXPR_ID --dataset mnist --batch_size 200 \
        --epochs 400 --num_latent_scales 2 --num_groups_per_scale 10 --num_postprocess_cells 3 --num_preprocess_cells 3 \
        --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 --num_latent_per_group 20 --num_preprocess_blocks 2 \
        --num_postprocess_blocks 2 --weight_decay_norm 1e-2 --num_channels_enc 32 --num_channels_dec 32 --num_nf 0 \
        --ada_groups --num_process_per_node 2 --use_se --res_dist --fast_adamax

</details> <details><summary>CIFAR-10</summary>

Eight 16-GB V100 GPUs are used for training NVAE on CIFAR-10. Training takes about 55 hours.

export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
cd $CODE_DIR
python train.py --data $DATA_DIR/cifar10 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset cifar10 \
        --num_channels_enc 128 --num_channels_dec 128 --epochs 400 --num_postprocess_cells 2 --num_preprocess_cells 2 \
        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 30 --batch_size 32 \
        --weight_decay_norm 1e-2 --num_nf 1 --num_process_per_node 8 --use_se --res_dist --fast_adamax

</details> <details><summary>CelebA 64</summary>

Eight 16-GB V100 GPUs are used for training NVAE on CelebA 64. Training takes about 92 hours.

export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
cd $CODE_DIR
python train.py --data $DATA_DIR/celeba64_lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_64 \
        --num_channels_enc 64 --num_channels_dec 64 --epochs 90 --num_postprocess_cells 2 --num_preprocess_cells 2 \
        --num_latent_scales 3 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-1 --num_groups_per_scale 20 \
        --batch_size 16 --num_nf 1 --ada_groups --num_process_per_node 8 --use_se --res_dist --fast_adamax

</details> <details><summary>ImageNet 32x32</summary>

24 16-GB V100 GPUs are used for training NVAE on ImageNet 32x32. Training takes about 70 hours.

export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
export IP_ADDR=IP_ADDRESS
export NODE_RANK=NODE_RANK_BETWEEN_0_TO_2
cd $CODE_DIR
mpirun --allow-run-as-root -np 3 -npernode 1 bash -c \
        'python train.py --data $DATA_DIR/imagenet-oord/imagenet-oord-lmdb_32 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset imagenet_32 \
        --num_channels_enc 192 --num_channels_dec 192 --epochs 45 --num_postprocess_cells 2 --num_preprocess_cells 2 \
        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 28 \
        --batch_size 24 --num_nf 1 --warmup_epochs 1 \
        --weight_decay_norm 1e-2 --weight_decay_norm_anneal --weight_decay_norm_init 1e0 \
        --num_process_per_node 8 --use_se --res_dist \
        --fast_adamax --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '

</details> <details><summary>CelebA HQ 256</summary>

24 32-GB V100 GPUs are used for training NVAE on CelebA HQ 256. Training takes about 94 hours.

export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
export IP_ADDR=IP_ADDRESS
export NODE_RANK=NODE_RANK_BETWEEN_0_TO_2
cd $CODE_DIR
mpirun --allow-run-as-root -np 3 -npernode 1 bash -c \
        'python train.py --data $DATA_DIR/celeba/celeba-lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_256 \
        --num_channels_enc 30 --num_channels_dec 30 --epochs 300 --num_postprocess_cells 2 --num_preprocess_cells 2 \
        --num_latent_scales 5 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
        --num_preprocess_blocks 1 --num

NVAE

Install / Use

README

The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 Spotlight Paper)

Requirements

Set up file paths and data

Running the main NVAE training and evaluation scripts