Pixel2style2pixel
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
Install / Use
/learn @eladrich/Pixel2style2pixelREADME
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
<a href="https://arxiv.org/abs/2008.00951"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg" height=22.5></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" height=22.5></a>
<a href="https://www.youtube.com/watch?v=bfvSwhqsTgM"><img src="https://img.shields.io/static/v1?label=CVPR 2021&message=5 Minute Video&color=red" height=22.5></a>
<a href="https://replicate.ai/eladrich/pixel2style2pixel"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=darkgreen" height=22.5></a>
<a href="http://colab.research.google.com/github/eladrich/pixel2style2pixel/blob/master/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=22.5></a>
<p align="center"> <img src="docs/teaser.png" width="800px"/> <br> The proposed pixel2style2pixel framework can be used to solve a wide variety of image-to-image translation tasks. Here we show results of pSp on StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting and super-resolution. </p>We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard "invert first, edit later" methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain.
Description
Official Implementation of our pSp paper for both training and evaluation. The pSp method extends the StyleGAN model to allow solving different image-to-image translation problems using its encoder.
Table of Contents
- Description
- Table of Contents
- Recent Updates
- Applications
- Getting Started
- Training
- Testing
- Additional Applications
- Repository structure
- TODOs
- Credits
- Inspired by pSp
- pSp in the Media
- Citation
Recent Updates
2020.10.04: Initial code release
2020.10.06: Add pSp toonify model (Thanks to the great work from Doron Adler and Justin Pinkney)!
2021.04.23: Added several new features:
- Added supported for StyleGANs of different resolutions (e.g., 256, 512, 1024). This can be set using the flag
--output_size, which is set to 1024 by default. - Added support for the MoCo-Based similarity loss introduced in encoder4editing (Tov et al. 2021). More details are provided below.
2021.07.06: Added support for training with Weights & Biases. See below for details.
Applications
StyleGAN Encoding
Here, we use pSp to find the latent code of real images in the latent domain of a pretrained StyleGAN generator.
<p align="center"> <img src="docs/encoding_inputs.jpg" width="800px"/> <img src="docs/encoding_outputs.jpg" width="800px"/> </p>Face Frontalization
In this application we want to generate a front-facing face from a given input image.
<p align="center"> <img src="docs/frontalization_inputs.jpg" width="800px"/> <img src="docs/frontalization_outputs.jpg" width="800px"/> </p>Conditional Image Synthesis
Here we wish to generate photo-realistic face images from ambiguous sketch images or segmentation maps. Using style-mixing, we inherently support multi-modal synthesis for a single input.
<p align="center"> <img src="docs/seg2image.png" width="800px"/> <img src="docs/sketch2image.png" width="800px"/> </p>Super Resolution
Given a low-resolution input image, we generate a corresponding high-resolution image. As this too is an ambiguous task, we can use style-mixing to produce several plausible results.
<p align="center"> <img src="docs/super_res_32.jpg" width="800px"/> <img src="docs/super_res_style_mixing.jpg" width="800px"/> </p>Getting Started
Prerequisites
- Linux or macOS
- NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
- Python 2 or 3
Installation
- Clone this repo:
git clone https://github.com/eladrich/pixel2style2pixel.git
cd pixel2style2pixel
- Dependencies:
We recommend running this repository using Anaconda. All dependencies for defining the environment are provided inenvironment/psp_env.yaml.
Inference Notebook
To help visualize the pSp framework on multiple tasks and to help you get started, we provide a Jupyter notebook found in notebooks/inference_playground.ipynb that allows one to visualize the various applications of pSp.
The notebook will download the necessary pretrained models and run inference on the images found in notebooks/images.
For the tasks of conditional image synthesis and super resolution, the notebook also demonstrates pSp's ability to perform multi-modal synthesis using
style-mixing.
Pretrained Models
Please download the pre-trained models from the following links. Each pSp model contains the entire pSp architecture, including the encoder and decoder weights. | Path | Description | :--- | :---------- |StyleGAN Inversion | pSp trained with the FFHQ dataset for StyleGAN inversion. |Face Frontalization | pSp trained with the FFHQ dataset for face frontalization. |Sketch to Image | pSp trained with the CelebA-HQ dataset for image synthesis from sketches. |Segmentation to Image | pSp trained with the CelebAMask-HQ dataset for image synthesis from segmentation maps. |Super Resolution | pSp trained with the CelebA-HQ dataset for super resolution (up to x32 down-sampling). |Toonify | pSp trained with the FFHQ dataset for toonification using StyleGAN generator from Doron Adler and Justin Pinkney.
If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.
In addition, we provide various auxiliary models needed for training your own pSp model from scratch as well as pretrained models needed for computing the ID metrics reported in the paper. | Path | Description | :--- | :---------- |FFHQ StyleGAN | StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution. |IR-SE50 Model | Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during pSp training. |MoCo ResNet-50 | Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based similarity loss on non-facial domains. The model is taken from the [official implementation](https://github.com/facebookresearch/moco
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
