HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing (CVPR 2022)

Yuval Alaluf*, Omer Tov*, Ron Mokady, Rinon Gal, Amit H. Bermano
*Denotes equal contribution

The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and editability: latent space regions which can accurately represent real images typically suffer from degraded semantic control. Recent work proposes to mitigate this trade-off by fine-tuning the generator to add the target image to well-behaved, editable regions of the latent space. While promising, this fine-tuning scheme is impractical for prevalent use as it requires a lengthy training phase for each new image. In this work, we introduce this approach into the realm of encoder-based inversion. We propose HyperStyle, a hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space. A naive modulation approach would require training a hypernetwork with over three billion parameters. Through careful network design, we reduce this to be in line with existing encoders. HyperStyle yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders. Lastly, we demonstrate HyperStyle's effectiveness on several applications beyond the inversion task, including the editing of out-of-domain images which were never seen during training.

Inference Notebook: <a href="http://colab.research.google.com/github/yuval-alaluf/hyperstyle/blob/master/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=20></a>
Animation Notebook: <a href="http://colab.research.google.com/github/yuval-alaluf/hyperstyle/blob/master/notebooks/animations_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=20></a>
Domain Adaptation Notebook: <a href="http://colab.research.google.com/github/yuval-alaluf/hyperstyle/blob/master/notebooks/domain_adaptation_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=20></a>

<img src="docs/teaser.jpg" width="800px"/> Given a desired input image, our hypernetworks learn to modulate a pre-trained StyleGAN network to achieve accurate image reconstructions in editable regions of the latent space. Doing so enables one to effectively apply techniques such as StyleCLIP and InterFaceGAN for editing real images.

Description

Official Implementation of our HyperStyle paper for both training and evaluation. HyperStyle introduces a new approach for learning to efficiently modify a pretrained StyleGAN generator based on a given target image through the use of hypernetworks.

Getting Started
- Prerequisites
- Installation
Pretrained HyperStyle Models
- Auxiliary Models
Training
Inference
Domain Adaptation
Repository structure
Related Works
Credits
Acknowledgments
Citation

Getting Started

Prerequisites

Linux or macOS
NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
Python 3

Installation

Dependencies: We recommend running this repository using Anaconda.
All dependencies for defining the environment are provided in environment/hyperstyle_env.yaml.

Pretrained HyperStyle Models

In this repository, we provide pretrained HyperStyle models for various domains.
All models make use of a modified, pretrained e4e encoder for obtaining an initial inversion into the W latent space.

Please download the pretrained models from the following links.

Auxiliary Models

In addition, we provide various auxiliary models needed for training your own HyperStyle models from scratch.
These include the pretrained e4e encoders into W, pretrained StyleGAN2 generators, and models used for loss computation.

Pretrained W-Encoders

StyleGAN2 Generators

Note: all StyleGAN models are converted from the official TensorFlow models to PyTorch using the conversion script from rosinality.

Other Utility Models

| Path | Description | :--- | :---------- |IR-SE50 Model | Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss and encoder backbone on human facial domain. |ResNet-34 Model | ResNet-34 model trained on ImageNet taken from torchvision for initializing our encoder backbone. |MoCov2 Model | Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based loss on non-facial domains. The model is taken from the official implementation. |CurricularFace Backbone | Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation. |MTCNN | Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

Preparing your Data

In order to train HyperStyle on your own data, you should perform the following steps:

Update configs/paths_config.py with the necessary data paths and model paths for training and inference.

dataset_paths = {
    'train_data': '/path/to/train/data'
    'test_data': '/path/t

Hyperstyle

Install / Use

README