SkillAgentSearch skills...

Pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework

Install / Use

/learn @eladrich/Pixel2style2pixel

README

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

<a href="https://arxiv.org/abs/2008.00951"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg" height=22.5></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" height=22.5></a>

<a href="https://www.youtube.com/watch?v=bfvSwhqsTgM"><img src="https://img.shields.io/static/v1?label=CVPR 2021&message=5 Minute Video&color=red" height=22.5></a>
<a href="https://replicate.ai/eladrich/pixel2style2pixel"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=darkgreen" height=22.5></a>

<a href="http://colab.research.google.com/github/eladrich/pixel2style2pixel/blob/master/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=22.5></a>

We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard "invert first, edit later" methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain.

<p align="center"> <img src="docs/teaser.png" width="800px"/> <br> The proposed pixel2style2pixel framework can be used to solve a wide variety of image-to-image translation tasks. Here we show results of pSp on StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting and super-resolution. </p>

Description

Official Implementation of our pSp paper for both training and evaluation. The pSp method extends the StyleGAN model to allow solving different image-to-image translation problems using its encoder.

Table of Contents

Recent Updates

2020.10.04: Initial code release
2020.10.06: Add pSp toonify model (Thanks to the great work from Doron Adler and Justin Pinkney)!
2021.04.23: Added several new features:

  • Added supported for StyleGANs of different resolutions (e.g., 256, 512, 1024). This can be set using the flag --output_size, which is set to 1024 by default.
  • Added support for the MoCo-Based similarity loss introduced in encoder4editing (Tov et al. 2021). More details are provided below.

2021.07.06: Added support for training with Weights & Biases. See below for details.

Applications

StyleGAN Encoding

Here, we use pSp to find the latent code of real images in the latent domain of a pretrained StyleGAN generator.

<p align="center"> <img src="docs/encoding_inputs.jpg" width="800px"/> <img src="docs/encoding_outputs.jpg" width="800px"/> </p>

Face Frontalization

In this application we want to generate a front-facing face from a given input image.

<p align="center"> <img src="docs/frontalization_inputs.jpg" width="800px"/> <img src="docs/frontalization_outputs.jpg" width="800px"/> </p>

Conditional Image Synthesis

Here we wish to generate photo-realistic face images from ambiguous sketch images or segmentation maps. Using style-mixing, we inherently support multi-modal synthesis for a single input.

<p align="center"> <img src="docs/seg2image.png" width="800px"/> <img src="docs/sketch2image.png" width="800px"/> </p>

Super Resolution

Given a low-resolution input image, we generate a corresponding high-resolution image. As this too is an ambiguous task, we can use style-mixing to produce several plausible results.

<p align="center"> <img src="docs/super_res_32.jpg" width="800px"/> <img src="docs/super_res_style_mixing.jpg" width="800px"/> </p>

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 2 or 3

Installation

  • Clone this repo:
git clone https://github.com/eladrich/pixel2style2pixel.git
cd pixel2style2pixel
  • Dependencies:
    We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/psp_env.yaml.

Inference Notebook

To help visualize the pSp framework on multiple tasks and to help you get started, we provide a Jupyter notebook found in notebooks/inference_playground.ipynb that allows one to visualize the various applications of pSp.
The notebook will download the necessary pretrained models and run inference on the images found in notebooks/images.
For the tasks of conditional image synthesis and super resolution, the notebook also demonstrates pSp's ability to perform multi-modal synthesis using style-mixing.

Pretrained Models

Please download the pre-trained models from the following links. Each pSp model contains the entire pSp architecture, including the encoder and decoder weights. | Path | Description | :--- | :---------- |StyleGAN Inversion | pSp trained with the FFHQ dataset for StyleGAN inversion. |Face Frontalization | pSp trained with the FFHQ dataset for face frontalization. |Sketch to Image | pSp trained with the CelebA-HQ dataset for image synthesis from sketches. |Segmentation to Image | pSp trained with the CelebAMask-HQ dataset for image synthesis from segmentation maps. |Super Resolution | pSp trained with the CelebA-HQ dataset for super resolution (up to x32 down-sampling). |Toonify | pSp trained with the FFHQ dataset for toonification using StyleGAN generator from Doron Adler and Justin Pinkney.

If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.

In addition, we provide various auxiliary models needed for training your own pSp model from scratch as well as pretrained models needed for computing the ID metrics reported in the paper. | Path | Description | :--- | :---------- |FFHQ StyleGAN | StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution. |IR-SE50 Model | Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during pSp training. |MoCo ResNet-50 | Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based similarity loss on non-facial domains. The model is taken from the [official implementation](https://github.com/facebookresearch/moco

Related Skills

View on GitHub
GitHub Stars3.3k
CategoryDevelopment
Updated4d ago
Forks579

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 24, 2026

No findings