SkillAgentSearch skills...

StackGAN

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.

Install / Use

/learn @Vishal-V/StackGAN

README

StackGAN

Text to Photo-Realistic Image Synthesis


Dependencies

tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0

Downloads

  • To download all the dependencies, simply execute
pip install -r requirements.txt
  • To download the CUB 200 dataset, simply execute the data_download.py file
python data_download.py
  • Download the Char-RNN-CNN embeddings from this link: download link and unzip it in place.
unzip birds.zip

Training

  • The model.py file contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level as model.py.
python model.py

Architecture

<img src="./assets/stackgan_framework.jpg" width="850px" height="370px"/>
  • Stage 1
    • Text Encoder Network
      • Text description to a 1024 dimensional text embedding
      • Learning Deep Representations of Fine-Grained Visual Descriptions Arxiv Link
    • Conditioning Augmentation Network
      • Adds randomness to the network
      • Produces more image-text pairs
    • Generator Network
    • Discriminator Network
    • Embedding Compressor Network
    • Outputs a 64x64 image

  • Stage 2
    • Text Encoder Network
    • Conditioning Augmentation Network
    • Generator Network
    • Discriminator Network
    • Embedding Compressor Network
    • Outputs a 256x256 image

Reference Papers

  1. StackGAN: Text to photo-realistic image synthesis [Arxiv Link]
  2. Improved Techniques for Training GANs [Arxiv Link]
  3. Generative Adversarial Text to Image Synthesis [Arxiv Link]
  4. Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link]

Note

This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.

Related Skills

View on GitHub
GitHub Stars38
CategoryDevelopment
Updated5mo ago
Forks9

Languages

Python

Security Score

92/100

Audited on Oct 21, 2025

No findings