PTI

Official Implementation for "Pivotal Tuning for Latent-based editing of Real Images" (ACM TOG 2022) https://arxiv.org/abs/2106.05744

Generate Convert Improve

Install / Use

/learn @danielroich/PTI

About this skill

Quality Score

0/100

README

PTI: Pivotal Tuning for Latent-based editing of Real Images (ACM TOG 2022)

<a href="https://arxiv.org/abs/2106.05744"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg"></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg"></a>
Inference Notebook: <a href="https://colab.research.google.com/github/danielroich/PTI/blob/main/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=20></a>

<img src="docs/teaser.jpg"/> Pivotal Tuning Inversion (PTI) enables employing off-the-shelf latent based semantic editing techniques on real images using StyleGAN. PTI excels in identity preserving edits, portrayed through recognizable figures — Serena Williams and Robert Downey Jr. (top), and in handling faces which are clearly out-of-domain, e.g., due to heavy makeup (bottom).

Description

Official Implementation of our PTI paper + code for evaluation metrics. PTI introduces an optimization mechanizem for solving the StyleGAN inversion task. Providing near-perfect reconstruction results while maintaining the high editing abilitis of the native StyleGAN latent space W. For more details, see <a href="https://arxiv.org/abs/2106.05744"><img src="https://img.shields.io/badge/arXiv-2008.00951-b31b1b.svg"></a>

Recent Updates

2021.07.01: Fixed files download phase in the inference notebook. Which might caused the notebook not to run smoothly.

2021.06.29: Added support for CPU. In order to run PTI on CPU please change device parameter under configs/global_config.py to "cpu" instead of "cuda".

2021.06.25 : Adding mohawk edit using StyleCLIP+PTI in inference notebook. Updating documentation in inference notebook due to Google Drive rate limit reached. Currently, Google Drive does not allow to download the pretrined models using Colab automatically. Manual intervention might be needed.

Getting Started

Prerequisites

Linux or macOS
NVIDIA GPU + CUDA CuDNN (Not mandatory bur recommended)
Python 3

Installation

Dependencies:
1. lpips
2. wandb
3. pytorch
4. torchvision
5. matplotlib
6. dlib
All dependencies can be installed using pip install and the package name

Pretrained Models

Please download the pretrained models from the following links.

Auxiliary Models

Note: The StyleGAN model is used directly from the official stylegan2-ada-pytorch implementation. For StyleCLIP pretrained mappers, please see StyleCLIP's official routes

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Inversion

Preparing your Data

In order to invert a real image and edit it you should first align and crop it to the correct size. To do so you should perform One of the following steps:

Run notebooks/align_data.ipynb and change the "images_path" variable to the raw images path
Run utils/align_data.py and change the "images_path" variable to the raw images path

Weights And Biases

The project supports Weights And Biases framework for experiment tracking. For the inversion task it enables visualization of the losses progression and the generator intermediate results during the initial inversion and the Pivotal Tuning(PT) procedure.

The log frequency can be adjusted using the parameters defined at configs/global_config.py under the "Logs" subsection.

There is no no need to have an account. However, in order to use the features provided by Weights and Biases you first have to register on their site.

Running PTI

The main training script is scripts/run_pti.py. The script receives aligned and cropped images from paths configured in the "Input info" subscetion in configs/paths_config.py. Results are saved to directories found at "Dirs for output files" under configs/paths_config.py. This includes inversion latent codes and tuned generators. The hyperparametrs for the inversion task can be found at configs/hyperparameters.py. They are intilized to the default values used in the paper.

Editing

By default, we assume that all auxiliary edit directions are downloaded and saved to the directory editings. However, you may use your own paths by changing the necessary values in configs/path_configs.py under "Edit directions" subsection.

Example of editing code can be found at scripts/latent_editor_wrapper.py

Inference Notebooks

To help visualize the results of PTI we provide a Jupyter notebook found in notebooks/inference_playground.ipynb.
The notebook will download the pretrained models and run inference on a sample image found online or on images of your choosing. It is recommended to run this in Google Colab.

The notebook demonstrates how to:

Invert an image using PTI
Visualise the inversion and use the PTI output
Edit the image after PTI using InterfaceGAN and StyleCLIP
Compare to other inversion methods

Evaluation

Currently the repository supports qualitative evaluation for reconstruction of: PTI, SG2 (W Space), e4e, SG2Plus (W+ Space). As well as editing using InterfaceGAN and GANSpace for the same inversion methods. To run the evaluation please see evaluation/qualitative_edit_comparison.py. Examples of the evaluation scripts are:

<img src="docs/model_rec.jpg"/> Reconsturction comparison between different methods. The images order is: Original image, W+ inversion, e4e inversion, W inversion, PTI inversion <img src="docs/stern_rotation.jpg"/> InterfaceGAN pose edit comparison between different methods. The images order is: Original, W+, e4e, W, PTI <img src="docs/tyron_original.jpg" width="220" height="220"/> <img src="docs/tyron_edit.jpg" width="220" height="220"/> Image per edit or several edits without comparison

Coming Soon - Quantitative evaluation and StyleCLIP qualitative evaluation

Repository structure

Related Skills

node-connect

351.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。