Gansformer
Generative Adversarial Transformers
Install / Use
/learn @dorarad/GansformerREADME
GANformer: Generative Adversarial Transformers
<p align="center"> <b><a href="https://cs.stanford.edu/~dorarad/">Drew A. Hudson</a> & <a href="http://larryzitnick.org/">C. Lawrence Zitnick</a></b> </p>Check out our new PyTorch version and the GANformer2 paper!
<div align="center"> <img src="https://cs.stanford.edu/people/dorarad/image1.png" style="float:left" width="340px"> <img src="https://cs.stanford.edu/people/dorarad/image3.png" style="float:right" width="440px"> </div> <p></p>Update (Feb 21, 2022): We updated the weight initialization of the PyTorch version to the intended scale, leading to a substantial improvement in the model's learning speed!
This is an implementation of the GANformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network.
<img align="right" src="https://cs.stanford.edu/people/dorarad/img3.png" width="270px">1st Paper: https://arxiv.org/pdf/2103.01209
2nd Paper: https://arxiv.org/abs/2111.08960
Contact: dorarad@cs.stanford.edu
Implementation: network.py (TF / Pytorch)
We now support both PyTorch and TF!
:white_check_mark: Uploading initial code and readme
:white_check_mark: Image sampling and visualization script
:white_check_mark: Code clean-up and refacotiring, adding documentation
:white_check_mark: Training and data-prepreation intructions
:white_check_mark: Pretrained networks for all datasets
:white_check_mark: Extra visualizations and evaluations
:white_check_mark: Providing models trained for longer
:white_check_mark: Releasing the PyTorch version
:white_check_mark: Releasing pre-trained models for high-resolutions (up to 1024 x 1024)
⬜️ Releasing the GANformer2 model (supporting layout generation and conditional layout2image generation)
If you experience any issues or have suggestions for improvements or extensions, feel free to contact me either thourgh the issues page or at dorarad@stanford.edu.
Bibtex
@article{hudson2021ganformer,
title={Generative Adversarial Transformers},
author={Hudson, Drew A and Zitnick, C. Lawrence},
journal={Proceedings of the 38th International Conference on Machine Learning, {ICML} 2021},
year={2021}
}
@article{hudson2021ganformer2,
title={Compositional Transformers for Scene Generation},
author={Hudson, Drew A and Zitnick, C. Lawrence},
journal={Advances in Neural Information Processing Systems {NeurIPS} 2021},
year={2021}
}
Sample Images
Using the pre-trained models (generated after training for 5-7x less steps than StyleGAN2 models! Training our models for longer will improve the image quality further):
<div align="center"> <img src="https://cs.stanford.edu/people/dorarad/samples.png" width="700px"> </div>Requirements
<img align="right" src="https://cs.stanford.edu/people/dorarad/dia.png" width="190px">- Python 3.6 or 3.7 are supported.
- For the TF version: We recommend TensorFlow 1.14 which was used for development, but TensorFlow 1.15 is also supported.
- For the Pytorch version: We support Pytorch >= 1.8.
- The code was tested with CUDA 10.0 toolkit and cuDNN 7.5.
- We have performed experiments on Titan V GPU. We assume 12GB of GPU memory (more memory can expedite training).
- See
requirements.txt(TF / Pytorch) for the required python packages and runpip install -r requirements.txtto install them.
Quickstart & Overview
Our repository supports both Tensorflow (at the main directory) and Pytorch (at pytorch_version). The two implementations follow a close code and files structure, and share the same interface. To switch from the TF to Pytorch, simply enter into pytorch_version), and install the requirements.
Please feel free to open an issue or contact for any questions or suggestions about the new implementation!
A minimal example of using a pre-trained GANformer can be found at generate.py (TF / Pytorch). When executed, the 10-lines program downloads a pre-trained modle and uses it to generate some images:
python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 32
You can use --truncation-psi to control the generated images quality/diversity trade-off.
We recommend trying out different values in the range of 0.6-1.0.
Pretrained models and High resolutions
We provide pretrained models for resolution 256×256 for all datasets, as well as 1024×1024 for FFHQ and 1024×2048 for Cityscapes.
To generate images for the high-resolution models, run the following commands: (We reduce their batch-size to 1 so that they can load onto a single GPU)
python generate.py --gpus 0 --model gdrive:ffhq-snapshot-1024.pkl --output-dir ffhq_images --images-num 32 --batch-size 1
python generate.py --gpus 0 --model gdrive:cityscapes-snapshot-2048.pkl --output-dir cityscapes_images --images-num 32 --batch-size 1 --ratio 0.5 # 1024 x 2048 cityscapes currently supported in the TF version only
We can train and evaluate new or pretrained model both quantitatively and qualitative with run_network.py (TF / Pytorch).
The model architecutre can be found at network.py (TF / Pytorch). The training procedure is implemented at training_loop.py (TF / Pytorch).
Data preparation
We explored the GANformer model on 4 datasets for images and scenes: CLEVR, LSUN-Bedrooms, Cityscapes and FFHQ. The model can be trained on other datasets as well.
We trained the model on 256x256 resolution. Higher resolutions are supported too. The model will automatically adapt to the resolution of the images in the dataset.
The prepare_data.py (TF / Pytorch) can either prepare the datasets from our catalog or create new datasets.
Default Datasets
To prepare the datasets from the catalog, run the following command:
python prepare_data.py --ffhq --cityscapes --clevr --bedrooms --max-images 100000
See table below for details about the datasets in the catalog.
Useful options:
--data-dirthe output data directory (default:datasets)--shards-numto select the number of shards for the data (default: adapted to each dataset)--max-imagesto store only a subset of the dataset, in order to reduce the size of the storedtfrecord/image files (default: max).
This can be particularly useful to save space in case of large datasets, such as LSUN-bedrooms (originaly contains 3M images)
Custom Datasets
You can also use the script to create new custom datasets. For instance:
python prepare_data.py --task <dataset-name> --images-dir <source-dir> --format png --ratio 0.7 --shards-num 5
The script supports several formats: png, jpg, npy, hdf5, tfds and lmdb.
Dataset Catalog
| Dataset | # Images | Resolution | Download Size | TFrecords Size | Gamma | | :---------------: | :-------: | :-----------: | :-----------: | :--------------: | :---: | | FFHQ | 70,000 | 256×256 | 13GB | 13GB | 10 | | CLEVR | 100,015 | 256×256 | 18GB | 15.5GB | 40 | | Cityscapes | 24,998 | 256×256 | 1.8GB | 8GB | 20 | | LSUN-Bedrooms | 3,033,042 | 256×256 | 42.8GB | Up to 480GB | 100 |
Use --max-images to reduce the size of the tfrecord files.
Training
Models are trained by using the --train option. To fine-tune a pretrained GANformer model:
python run_network.py --train --gpus 0 --ganformer-default --expname clevr-pretrained --dataset clevr \
--pretrained-pkl gdrive:clevr-snapshot.pkl
We provide pretrained models for bedrooms, cityscapes, clevr and ffhq.
To train a GANformer in its default configuration form scratch:
python run_network.py --train --gpus 0 --ganformer-default --exp
Related Skills
docs-writer
99.0k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
335.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
pr
for a github pr, please respond in the following format - ## What type of PR is this? - [ ] 🍕 Feature - [ ] 🐛 Bug Fix - [ ] 📝 Documentation - [ ] 🧑💻 Code Refactor - [ ] 🔧 Other ## Description <!-- What changed and why? Optional: include screenshots or other supporting artifacts. --> ## Related Issues <!-- Link issues like: Fixes #123 --> ## Updated requirements or dependencies? - [ ] Requirements or dependencies added/updated/removed - [ ] No requirements changed ## Testing - [ ] Tests added/updated - [ ] No tests needed **How to test or why no tests:** <!-- Describe test steps or explain why tests aren't needed --> ## Checklist - [ ] Self-reviewed the code - [ ] Tests pass locally - [ ] No console errors/warnings ## [optional] What gif best describes this PR?
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
