Gansformer

Generative Adversarial Transformers

Generate Convert Improve

Install / Use

/learn @dorarad/Gansformer

About this skill

Quality Score

0/100

README

GANformer: Generative Adversarial Transformers

<p align="center"> <b><a href="https://cs.stanford.edu/~dorarad/">Drew A. Hudson</a> & <a href="http://larryzitnick.org/">C. Lawrence Zitnick</a></b> </p>

Check out our new PyTorch version and the GANformer2 paper!

Update (Feb 21, 2022): We updated the weight initialization of the PyTorch version to the intended scale, leading to a substantial improvement in the model's learning speed!

This is an implementation of the GANformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network.

1st Paper: https://arxiv.org/pdf/2103.01209
2nd Paper: https://arxiv.org/abs/2111.08960
Contact: dorarad@cs.stanford.edu
Implementation: network.py (TF / Pytorch)

We now support both `PyTorch` and TF!

:white_check_mark: Uploading initial code and readme
:white_check_mark: Image sampling and visualization script
:white_check_mark: Code clean-up and refacotiring, adding documentation
:white_check_mark: Training and data-prepreation intructions
:white_check_mark: Pretrained networks for all datasets
:white_check_mark: Extra visualizations and evaluations
:white_check_mark: Providing models trained for longer
:white_check_mark: Releasing the PyTorch version
:white_check_mark: Releasing pre-trained models for high-resolutions (up to 1024 x 1024)
⬜️ Releasing the GANformer2 model (supporting layout generation and conditional layout2image generation)

If you experience any issues or have suggestions for improvements or extensions, feel free to contact me either thourgh the issues page or at dorarad@stanford.edu.

Bibtex

@article{hudson2021ganformer,
  title={Generative Adversarial Transformers},
  author={Hudson, Drew A and Zitnick, C. Lawrence},
  journal={Proceedings of the 38th International Conference on Machine Learning, {ICML} 2021},
  year={2021}
}

@article{hudson2021ganformer2,
  title={Compositional Transformers for Scene Generation},
  author={Hudson, Drew A and Zitnick, C. Lawrence},
  journal={Advances in Neural Information Processing Systems {NeurIPS} 2021},
  year={2021}
}

Sample Images

Using the pre-trained models (generated after training for 5-7x less steps than StyleGAN2 models! Training our models for longer will improve the image quality further):

Requirements

Python 3.6 or 3.7 are supported.
For the TF version: We recommend TensorFlow 1.14 which was used for development, but TensorFlow 1.15 is also supported.
For the Pytorch version: We support Pytorch >= 1.8.
The code was tested with CUDA 10.0 toolkit and cuDNN 7.5.
We have performed experiments on Titan V GPU. We assume 12GB of GPU memory (more memory can expedite training).
See requirements.txt (TF / Pytorch) for the required python packages and run pip install -r requirements.txt to install them.

Quickstart & Overview

Our repository supports both Tensorflow (at the main directory) and Pytorch (at pytorch_version). The two implementations follow a close code and files structure, and share the same interface. To switch from the TF to Pytorch, simply enter into pytorch_version), and install the requirements. Please feel free to open an issue or contact for any questions or suggestions about the new implementation!

A minimal example of using a pre-trained GANformer can be found at generate.py (TF / Pytorch). When executed, the 10-lines program downloads a pre-trained modle and uses it to generate some images:

python generate.py --gpus 0 --model gdrive:bedrooms-snapshot.pkl --output-dir images --images-num 32

You can use --truncation-psi to control the generated images quality/diversity trade-off.
We recommend trying out different values in the range of 0.6-1.0.

Pretrained models and High resolutions

We provide pretrained models for resolution 256×256 for all datasets, as well as 1024×1024 for FFHQ and 1024×2048 for Cityscapes.

To generate images for the high-resolution models, run the following commands: (We reduce their batch-size to 1 so that they can load onto a single GPU)

python generate.py --gpus 0 --model gdrive:ffhq-snapshot-1024.pkl --output-dir ffhq_images --images-num 32 --batch-size 1
python generate.py --gpus 0 --model gdrive:cityscapes-snapshot-2048.pkl --output-dir cityscapes_images --images-num 32 --batch-size 1 --ratio 0.5 # 1024 x 2048 cityscapes currently supported in the TF version only

We can train and evaluate new or pretrained model both quantitatively and qualitative with run_network.py (TF / Pytorch).
The model architecutre can be found at network.py (TF / Pytorch). The training procedure is implemented at training_loop.py (TF / Pytorch).

Data preparation

We explored the GANformer model on 4 datasets for images and scenes: CLEVR, LSUN-Bedrooms, Cityscapes and FFHQ. The model can be trained on other datasets as well. We trained the model on 256x256 resolution. Higher resolutions are supported too. The model will automatically adapt to the resolution of the images in the dataset.

The prepare_data.py (TF / Pytorch) can either prepare the datasets from our catalog or create new datasets.

Default Datasets

To prepare the datasets from the catalog, run the following command:

python prepare_data.py --ffhq --cityscapes --clevr --bedrooms --max-images 100000

See table below for details about the datasets in the catalog.

Useful options:

--data-dir the output data directory (default: datasets)
--shards-num to select the number of shards for the data (default: adapted to each dataset)
--max-images to store only a subset of the dataset, in order to reduce the size of the stored tfrecord/image files (default: max).
This can be particularly useful to save space in case of large datasets, such as LSUN-bedrooms (originaly contains 3M images)

Custom Datasets

You can also use the script to create new custom datasets. For instance:

python prepare_data.py --task <dataset-name> --images-dir <source-dir> --format png --ratio 0.7 --shards-num 5

The script supports several formats: png, jpg, npy, hdf5, tfds and lmdb.

Dataset Catalog

| Dataset | # Images | Resolution | Download Size | TFrecords Size | Gamma | | :---------------: | :-------: | :-----------: | :-----------: | :--------------: | :---: | | FFHQ | 70,000 | 256×256 | 13GB | 13GB | 10 | | CLEVR | 100,015 | 256×256 | 18GB | 15.5GB | 40 | | Cityscapes | 24,998 | 256×256 | 1.8GB | 8GB | 20 | | LSUN-Bedrooms | 3,033,042 | 256×256 | 42.8GB | Up to 480GB | 100 |

Use --max-images to reduce the size of the tfrecord files.

Training

Models are trained by using the --train option. To fine-tune a pretrained GANformer model:

python run_network.py --train --gpus 0 --ganformer-default --expname clevr-pretrained --dataset clevr \
  --pretrained-pkl gdrive:clevr-snapshot.pkl

We provide pretrained models for bedrooms, cityscapes, clevr and ffhq.

To train a GANformer in its default configuration form scratch:

python run_network.py --train --gpus 0 --ganformer-default --exp

Related Skills

docs-writer

99.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

335.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

for a github pr, please respond in the following format - ## What type of PR is this? - [ ] 🍕 Feature - [ ] 🐛 Bug Fix - [ ] 📝 Documentation - [ ] 🧑‍💻 Code Refactor - [ ] 🔧 Other ## Description  ## Related Issues  ## Updated requirements or dependencies? - [ ] Requirements or dependencies added/updated/removed - [ ] No requirements changed ## Testing - [ ] Tests added/updated - [ ] No tests needed **How to test or why no tests:**  ## Checklist - [ ] Self-reviewed the code - [ ] Tests pass locally - [ ] No console errors/warnings ## [optional] What gif best describes this PR?

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

dorarad

View profile

View on GitHub

GitHub Stars1.3k

CategoryContent

Updated16d ago

Forks151

dorarad/gansformer

Languages

Python

Security Score

100/100

Audited on Mar 9, 2026

No findings