Diffae
Official implementation of Diffusion Autoencoders
Install / Use
/learn @konpatp/DiffaeREADME
Official implementation of Diffusion Autoencoders
A CVPR 2022 (ORAL) paper (paper, site, 5-min video):
@inproceedings{preechakul2021diffusion,
title={Diffusion Autoencoders: Toward a Meaningful and Decodable Representation},
author={Preechakul, Konpat and Chatthee, Nattanat and Wizadwongsa, Suttisak and Suwajanakorn, Supasorn},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022},
}
Usage
Note: Since we expect a lot of changes on the codebase, please fork the repo before using.
Prerequisites
See requirements.txt
pip install -r requirements.txt
Quick start
A jupyter notebook.
For unconditional generation: sample.ipynb
For manipulation: manipulate.ipynb
For interpolation: interpolate.ipynb
For autoencoding: autoencoding.ipynb
Aligning your own images:
- Put images into the
imgsdirectory - Run
align.py(need topip install dlib requests) - Result images will be available in
imgs_aligndirectory
Checkpoints
We provide checkpoints for the following models:
- DDIM: FFHQ128 (72M, 130M), Bedroom128, Horse128
- DiffAE (autoencoding only): FFHQ256, FFHQ128 (72M, 130M), Bedroom128, Horse128
- DiffAE (with latent DPM, can sample): FFHQ256, FFHQ128, Bedroom128, Horse128
- DiffAE's classifiers (for manipulation): FFHQ256's latent on CelebAHQ, FFHQ128's latent on CelebAHQ
Checkpoints ought to be put into a separate directory checkpoints.
Download the checkpoints and put them into checkpoints directory. It should look like this:
checkpoints/
- bedroom128_autoenc
- last.ckpt # diffae checkpoint
- latent.ckpt # predicted z_sem on the dataset
- bedroom128_autoenc_latent
- last.ckpt # diffae + latent DPM checkpoint
- bedroom128_ddpm
- ...
LMDB Datasets
We do not own any of the following datasets. We provide the LMDB ready-to-use dataset for the sake of convenience.
Broken links
Note: I'm trying to recover the following links.
The directory tree should be:
datasets/
- bedroom256.lmdb
- celebahq256.lmdb
- celeba.lmdb
- ffhq256.lmdb
- horse256.lmdb
You can also download from the original sources, and use our provided codes to package them as LMDB files. Original sources for each dataset is as follows:
- FFHQ (https://github.com/NVlabs/ffhq-dataset)
- CelebAHQ (https://github.com/switchablenorms/CelebAMask-HQ)
- CelebA (https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)
- LSUN (https://github.com/fyu/lsun)
The conversion codes are provided as:
data_resize_bedroom.py
data_resize_celebhq.py
data_resize_celeba.py
data_resize_ffhq.py
data_resize_horse.py
Google drive: https://drive.google.com/drive/folders/1abNP4QKGbNnymjn8607BF0cwxX2L23jh?usp=sharing
Training
We provide scripts for training & evaluate DDIM and DiffAE (including latent DPM) on the following datasets: FFHQ128, FFHQ256, Bedroom128, Horse128, Celeba64 (D2C's crop).
Usually, the evaluation results (FID's) will be available in eval directory.
Note: Most experiment requires at least 4x V100s during training the DPM models while requiring 1x 2080Ti during training the accompanying latent DPM.
FFHQ128
# diffae
python run_ffhq128.py
# ddim
python run_ffhq128_ddim.py
A classifier (for manipulation) can be trained using:
python run_ffhq128_cls.py
FFHQ256
We only trained the DiffAE due to high computation cost. This requires 8x V100s.
sbatch run_ffhq256.py
After the task is done, you need to train the latent DPM (requiring only 1x 2080Ti)
python run_ffhq256_latent.py
A classifier (for manipulation) can be trained using:
python run_ffhq256_cls.py
Bedroom128
# diffae
python run_bedroom128.py
# ddim
python run_bedroom128_ddim.py
Horse128
# diffae
python run_horse128.py
# ddim
python run_horse128_ddim.py
Celeba64
This experiment can be run on 2080Ti's.
# diffae
python run_celeba64.py
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
last30days-skill
4.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
