NVAE
The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)
Install / Use
/learn @NVlabs/NVAEREADME
The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 Spotlight Paper)
<div align="center"> <a href="http://latentspace.cc/arash_vahdat/" target="_blank">Arash Vahdat</a>   <b>·</b>   <a href="http://jankautz.com/" target="_blank">Jan Kautz</a> </div> <br> <br>NVAE is a deep hierarchical variational autoencoder that enables training SOTA likelihood-based generative models on several image datasets.
<p align="center"> <img src="img/celebahq.png" width="800"> </p>Requirements
NVAE is built in Python 3.7 using PyTorch 1.6.0. Use the following command to install the requirements:
pip install -r requirements.txt
Set up file paths and data
We have examined NVAE on several datasets. For large datasets, we store the data in LMDB datasets
for I/O efficiency. Click below on each dataset to see how you can prepare your data. Below, $DATA_DIR indicates
the path to a data directory that will contain all the datasets and $CODE_DIR refers to the code directory:
These datasets will be downloaded automatically, when you run the main training for NVAE using train.py
for the first time. You can use --data=$DATA_DIR/mnist or --data=$DATA_DIR/cifar10, so that the datasets
are downloaded to the corresponding directories.
cd $CODE_DIR/scripts
python create_celeba64_lmdb.py --split train --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb
python create_celeba64_lmdb.py --split valid --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb
python create_celeba64_lmdb.py --split test --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb
Above, the images will be downloaded to $DATA_DIR/celeba_org automatically and then then LMDB datasets are created
at $DATA_DIR/celeba64_lmdb.
Run the following commands to download tfrecord files from GLOW and to convert them to LMDB datasets
mkdir -p $DATA_DIR/imagenet-oord
cd $DATA_DIR/imagenet-oord
wget https://storage.googleapis.com/glow-demo/data/imagenet-oord-tfr.tar
tar -xvf imagenet-oord-tfr.tar
cd $CODE_DIR/scripts
python convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR/imagenet-oord/mnt/host/imagenet-oord-tfr --lmdb_path=$DATA_DIR/imagenet-oord/imagenet-oord-lmdb_32 --split=train
python convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR/imagenet-oord/mnt/host/imagenet-oord-tfr --lmdb_path=$DATA_DIR/imagenet-oord/imagenet-oord-lmdb_32 --split=validation
</details>
<details><summary>CelebA HQ 256</summary>
Run the following commands to download tfrecord files from GLOW and to convert them to LMDB datasets
mkdir -p $DATA_DIR/celeba
cd $DATA_DIR/celeba
wget https://storage.googleapis.com/glow-demo/data/celeba-tfr.tar
tar -xvf celeba-tfr.tar
cd $CODE_DIR/scripts
python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=train
python convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR/celeba/celeba-tfr --lmdb_path=$DATA_DIR/celeba/celeba-lmdb --split=validation
</details>
<details><summary>FFHQ 256</summary>
Visit this Google drive location and download
images1024x1024.zip. Run the following commands to unzip the images and to store them in LMDB datasets:
mkdir -p $DATA_DIR/ffhq
unzip images1024x1024.zip -d $DATA_DIR/ffhq/
cd $CODE_DIR/scripts
python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/images1024x1024/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=train
python create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR/ffhq/images1024x1024/ --ffhq_lmdb_path=$DATA_DIR/ffhq/ffhq-lmdb --split=validation
</details>
<details><summary>LSUN</summary>
We use LSUN datasets in our follow-up works. Visit LSUN for instructions on how to download this dataset. Since the LSUN scene datasets come in the LMDB format, they are ready to be loaded using torchvision data loaders.
</details>Running the main NVAE training and evaluation scripts
We use the following commands on each dataset for training NVAEs on each dataset for
Table 1 in the paper. In all the datasets but MNIST
normalizing flows are enabled. Check Table 6 in the paper for more information on training
details. Note that for the multinode training (more than 8-GPU experiments), we use the mpirun
command to run the training scripts on multiple nodes. Please adjust the commands below according to your setup.
Below IP_ADDR is the IP address of the machine that will host the process with rank 0
(see here).
NODE_RANK is the index of each node among all the nodes that are running the job.
Two 16-GB V100 GPUs are used for training NVAE on dynamically binarized MNIST. Training takes about 21 hours.
export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
cd $CODE_DIR
python train.py --data $DATA_DIR/mnist --root $CHECKPOINT_DIR --save $EXPR_ID --dataset mnist --batch_size 200 \
--epochs 400 --num_latent_scales 2 --num_groups_per_scale 10 --num_postprocess_cells 3 --num_preprocess_cells 3 \
--num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 --num_latent_per_group 20 --num_preprocess_blocks 2 \
--num_postprocess_blocks 2 --weight_decay_norm 1e-2 --num_channels_enc 32 --num_channels_dec 32 --num_nf 0 \
--ada_groups --num_process_per_node 2 --use_se --res_dist --fast_adamax
</details>
<details><summary>CIFAR-10</summary>
Eight 16-GB V100 GPUs are used for training NVAE on CIFAR-10. Training takes about 55 hours.
export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
cd $CODE_DIR
python train.py --data $DATA_DIR/cifar10 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset cifar10 \
--num_channels_enc 128 --num_channels_dec 128 --epochs 400 --num_postprocess_cells 2 --num_preprocess_cells 2 \
--num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
--num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 30 --batch_size 32 \
--weight_decay_norm 1e-2 --num_nf 1 --num_process_per_node 8 --use_se --res_dist --fast_adamax
</details>
<details><summary>CelebA 64</summary>
Eight 16-GB V100 GPUs are used for training NVAE on CelebA 64. Training takes about 92 hours.
export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
cd $CODE_DIR
python train.py --data $DATA_DIR/celeba64_lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_64 \
--num_channels_enc 64 --num_channels_dec 64 --epochs 90 --num_postprocess_cells 2 --num_preprocess_cells 2 \
--num_latent_scales 3 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
--num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-1 --num_groups_per_scale 20 \
--batch_size 16 --num_nf 1 --ada_groups --num_process_per_node 8 --use_se --res_dist --fast_adamax
</details>
<details><summary>ImageNet 32x32</summary>
24 16-GB V100 GPUs are used for training NVAE on ImageNet 32x32. Training takes about 70 hours.
export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
export IP_ADDR=IP_ADDRESS
export NODE_RANK=NODE_RANK_BETWEEN_0_TO_2
cd $CODE_DIR
mpirun --allow-run-as-root -np 3 -npernode 1 bash -c \
'python train.py --data $DATA_DIR/imagenet-oord/imagenet-oord-lmdb_32 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset imagenet_32 \
--num_channels_enc 192 --num_channels_dec 192 --epochs 45 --num_postprocess_cells 2 --num_preprocess_cells 2 \
--num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
--num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 28 \
--batch_size 24 --num_nf 1 --warmup_epochs 1 \
--weight_decay_norm 1e-2 --weight_decay_norm_anneal --weight_decay_norm_init 1e0 \
--num_process_per_node 8 --use_se --res_dist \
--fast_adamax --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '
</details>
<details><summary>CelebA HQ 256</summary>
24 32-GB V100 GPUs are used for training NVAE on CelebA HQ 256. Training takes about 94 hours.
export EXPR_ID=UNIQUE_EXPR_ID
export DATA_DIR=PATH_TO_DATA_DIR
export CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR
export CODE_DIR=PATH_TO_CODE_DIR
export IP_ADDR=IP_ADDRESS
export NODE_RANK=NODE_RANK_BETWEEN_0_TO_2
cd $CODE_DIR
mpirun --allow-run-as-root -np 3 -npernode 1 bash -c \
'python train.py --data $DATA_DIR/celeba/celeba-lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_256 \
--num_channels_enc 30 --num_channels_dec 30 --epochs 300 --num_postprocess_cells 2 --num_preprocess_cells 2 \
--num_latent_scales 5 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \
--num_preprocess_blocks 1 --num
