CREPS
Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis (CVPR 2023)
Install / Use
/learn @VinAIResearch/CREPSREADME
Table of contents
- Requirements
- Getting Started
- Using networks from Python
- Preparing datasets
- Training
- Quality Metrics
- Contacts
Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
<a href="https://thuanz123.github.io/creps"><img src="https://img.shields.io/badge/WEBSITE-Visit%20project%20page-blue?style=for-the-badge"></a> <a href="https://arxiv.org/abs/2303.14157"><img src="https://img.shields.io/badge/arxiv-2303.14157-red?style=for-the-badge"></a>
Thuan Hoang Nguyen, Thanh Van Le, Anh Tran<br> VinAI Research, Vietnam
Abstract: Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the "texture sticking" issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose Column-Row Entangled Pixel Synthesis (CREPS), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate "thick" column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed.

Details of the model architecture and experimental results can be found in our following paper.
@inproceedings{thuan2023creps,
title={Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis},
author={Thuan Hoang Nguyen, Thanh Van Le, Anh Tran},
year={2023},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}
Please CITE our paper whenever our model implementation is used to help produce published results or incorporated into other software.
Additional material
- CREPS pre-trained models
- StyleGAN3 pre-trained models compatible with this codebase
<sub>Access individual networks via
https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/<MODEL>, where<MODEL>is one of:</sub><br> <sub>stylegan3-t-ffhq-1024x1024.pkl,stylegan3-t-ffhqu-1024x1024.pkl,stylegan3-t-ffhqu-256x256.pkl</sub><br> <sub>stylegan3-r-ffhq-1024x1024.pkl,stylegan3-r-ffhqu-1024x1024.pkl,stylegan3-r-ffhqu-256x256.pkl</sub><br> <sub>stylegan3-t-metfaces-1024x1024.pkl,stylegan3-t-metfacesu-1024x1024.pkl</sub><br> <sub>stylegan3-r-metfaces-1024x1024.pkl,stylegan3-r-metfacesu-1024x1024.pkl</sub><br> <sub>stylegan3-t-afhqv2-512x512.pkl</sub><br> <sub>stylegan3-r-afhqv2-512x512.pkl</sub><br> - StyleGAN2 pre-trained models compatible with this codebase
<sub>Access individual networks via
https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/<MODEL>, where<MODEL>is one of:</sub><br> <sub>stylegan2-ffhq-1024x1024.pkl,stylegan2-ffhq-512x512.pkl,stylegan2-ffhq-256x256.pkl</sub><br> <sub>stylegan2-ffhqu-1024x1024.pkl,stylegan2-ffhqu-256x256.pkl</sub><br> <sub>stylegan2-metfaces-1024x1024.pkl,stylegan2-metfacesu-1024x1024.pkl</sub><br> <sub>stylegan2-afhqv2-512x512.pkl</sub><br> <sub>stylegan2-afhqcat-512x512.pkl,stylegan2-afhqdog-512x512.pkl,stylegan2-afhqwild-512x512.pkl</sub><br> <sub>stylegan2-brecahad-512x512.pkl,stylegan2-cifar10-32x32.pkl</sub><br> <sub>stylegan2-celebahq-256x256.pkl,stylegan2-lsundog-256x256.pkl</sub><br>
Requirements
- Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
- 1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using Tesla V100 and A100 GPUs.
- 64-bit Python 3.8 and PyTorch 1.9.0 (or later). See https://pytorch.org for PyTorch install instructions.
- CUDA toolkit 11.1 or later. (Why is a separate CUDA toolkit installation required? See Troubleshooting).
- GCC 7 or later (Linux) or Visual Studio (Windows) compilers. Recommended GCC version depends on CUDA version, see for example CUDA 11.4 system requirements.
- Python libraries: see environment.yml for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your CREPS Python environment:
conda env create -f environment.ymlconda activate creps
- Docker users:
- Ensure you have correctly installed the NVIDIA container runtime.
- Use the provided Dockerfile to build an image with the required library dependencies.
The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvars64.bat".
See Troubleshooting for help on common installation and run-time problems.
Getting started
Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:
# Generate an image using pre-trained FFHQ model.
python gen_images.py --outdir=out --trunc=1 --seeds=2 \
--network=creps-ffhq-512x512.pkl --scale=2
# Render a 4x2 grid of interpolations for seeds 0 through 31.
python gen_video.py --output=lerp.mp4 --trunc=1 --seeds=0-31 --grid=4x2 \
--network=creps-ffhq-512x512.pkl
Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.
Docker: You can run the above curated image example using Docker as follows:
# Build the creps:latest image
docker build --tag creps .
# Run the gen_images.py script using Docker:
docker run --gpus all -it --rm --user $(id -u):$(id -g) \
-v `pwd`:/scratch --workdir /scratch -e HOME=/scratch \
creps \
python gen_images.py --outdir=out --trunc=1 --seeds=2 \
--network=creps-ffhq-512x512.pkl
Note: The Docker image requires NVIDIA driver release r470 or later.
The docker run invocation may look daunting, so let's unpack its contents here:
--gpus all -it --rm --user $(id -u):$(id -g): with all GPUs enabled, run an interactive session with current user's UID/GID to avoid Docker writing files as root.-v `pwd`:/scratch --workdir /scratch: mount current running dir (e.g., the top of this git repo on your host machine) to/scratchin the container and use that as the current working dir.-e HOME=/scratch: let PyTorch and CREPS code know where to cache temporary files such as pre-trained models and custom PyTorch extension build results. Note: if you want more fine-grained control, you can instead setTORCH_EXTENSIONS_DIR(for custom extensions build dir) andDNNLIB_CACHE_DIR(for pre-trained model download cache). You want these cache dirs to reside on persistent volumes so that their contents are retained across multipledocker runinvocations.
Using networks from Python
You can use pre-trained networks in your own Python code as follows:
with open('ffhq.pkl', 'rb') as f:
G = pickle.load(f)['G_ema'].cuda() # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda() # latent codes
c = None # class labels (not used in this example)
img = G(z, c) # NCHW, float32, dynamic range [-1, +1], no truncation
The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence.
The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.
The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:
w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
