StackGAN
TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.
Install / Use
/learn @Vishal-V/StackGANREADME
StackGAN
Text to Photo-Realistic Image Synthesis
Dependencies
tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0
Downloads
- To download all the dependencies, simply execute
pip install -r requirements.txt
- To download the CUB 200 dataset, simply execute the
data_download.pyfile
python data_download.py
- Download the Char-RNN-CNN embeddings from this link: download link and unzip it in place.
unzip birds.zip
Training
- The
model.pyfile contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level asmodel.py.
python model.py
Architecture
<img src="./assets/stackgan_framework.jpg" width="850px" height="370px"/>- Stage 1
- Text Encoder Network
- Text description to a 1024 dimensional text embedding
- Learning Deep Representations of Fine-Grained Visual Descriptions Arxiv Link
- Conditioning Augmentation Network
- Adds randomness to the network
- Produces more image-text pairs
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 64x64 image
- Text Encoder Network
- Stage 2
- Text Encoder Network
- Conditioning Augmentation Network
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 256x256 image
Reference Papers
- StackGAN: Text to photo-realistic image synthesis [Arxiv Link]
- Improved Techniques for Training GANs [Arxiv Link]
- Generative Adversarial Text to Image Synthesis [Arxiv Link]
- Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link]
Note
This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
