VToonify
[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
Install / Use
/learn @williamyang1991/VToonifyREADME
VToonify - Official PyTorch Implementation
https://user-images.githubusercontent.com/18130694/189483939-0fc4a358-fb34-43cc-811a-b22adb820d57.mp4
This repository provides the official PyTorch implementation for the following paper:
VToonify: Controllable High-Resolution Portrait Video Style Transfer<br> Shuai Yang, Liming Jiang, Ziwei Liu and Chen Change Loy<br> In ACM TOG (Proceedings of SIGGRAPH Asia), 2022.<br> Project Page | Paper | Supplementary Video | Input Data and Video Results <br>
<a href="http://colab.research.google.com/github/williamyang1991/VToonify/blob/master/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>
Abstract: Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency. In this work, we investigate the challenging controllable high-resolution portrait video style transfer by introducing a novel VToonify framework. Specifically, VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output. Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity. This work presents two instantiations of VToonify built upon Toonify and DualStyleGAN for collection-based and exemplar-based portrait video style transfer, respectively. Extensive experimental results demonstrate the effectiveness of our proposed VToonify framework over existing methods in generating high-quality and temporally-coherent artistic portrait videos with flexible style controls.
Features:<br> High-Resolution Video (>1024, support unaligned faces) | Data-Friendly (no real training data) | Style Control

Updates
- [02/2023] Integrated to Deque Notebook.
- [10/2022] Integrate Gradio interface into Colab notebook. Enjoy the web demo!
- [10/2022] Integrated to 🤗 Hugging Face. Enjoy the web demo!
- [09/2022] Input videos and video results are released.
- [09/2022] Paper is released.
- [09/2022] Code is released.
- [09/2022] This website is created.
Web Demo
Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo
Installation
Clone this repo:
git clone https://github.com/williamyang1991/VToonify.git
cd VToonify
Dependencies:
We have tested on:
- CUDA 10.1
- PyTorch 1.7.0
- Pillow 8.3.1; Matplotlib 3.3.4; opencv-python 4.5.3; Faiss 1.7.1; tqdm 4.61.2; Ninja 1.10.2
All dependencies for defining the environment are provided in environment/vtoonify_env.yaml.
We recommend running this repository using Anaconda (you may need to modify vtoonify_env.yaml to install PyTorch that matches your own CUDA version following https://pytorch.org/):
conda env create -f ./environment/vtoonify_env.yaml
☞ Install on Windows: https://github.com/williamyang1991/VToonify/issues/50#issuecomment-1443061101 and https://github.com/williamyang1991/VToonify/issues/38#issuecomment-1442146800
☞ If you have a problem regarding the cpp extention (fused and upfirdn2d), or no GPU is available, you may refer to CPU compatible version.
<br/>(1) Inference for Image/Video Toonification
Inference Notebook
<a href="http://colab.research.google.com/github/williamyang1991/VToonify/blob/master/notebooks/inference_playground.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>
To help users get started, we provide a Jupyter notebook found in ./notebooks/inference_playground.ipynb that allows one to visualize the performance of VToonify.
The notebook will download the necessary pretrained models and run inference on the images found in ./data/.
Pre-trained Models
Pre-trained models can be downloaded from Google Drive, Baidu Cloud (access code: sigg) or Hugging Face:
<table> <tr> <th>Backbone</th><th>Model</th><th>Description</th> </tr> <tr> <td rowspan="6">DualStyleGAN</td><td><a href="https://drive.google.com/drive/folders/1DuZfXt6b_xhTAQSN0D8m7N1np0Web0Ky">cartoon</a></td><td>pre-trained VToonify-D models and 317 cartoon style codes</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/12TzTQqwBedsYX3kE_420mdTbWl9lwv4Y">caricature</a></td><td>pre-trained VToonify-D models and 199 caricature style codes</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1MpEqS26Q1ngTPeex_4MN9qOJxfXKH-k-">arcane</a></td><td>pre-trained VToonify-D models and 100 arcane style codes</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/15mxb7DxTzEBrKtx5aJ_I5WGDjSWBmcUi">comic</a></td><td>pre-trained VToonify-D models and 101 comic style codes</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1Hld7OeZqYBrg6r35IA_x4sNtt1abHUMU">pixar</a></td><td>pre-trained VToonify-D models and 122 pixar style codes</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1LQGNMDEHM70nOhm3-xY228YpJNlPnf_s">illustration</a></td><td>pre-trained VToonify-D models and 156 illustration style codes</td> </tr> <tr> <td rowspan="5">Toonify</td><td><a href="https://drive.google.com/drive/folders/1FFtTVgiDKZ_InnwUJLDuA1wfghZp41nX">cartoon</a></td><td>pre-trained VToonify-T model</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1ReRxttV-macgV3epC61qg4TQ3FGAhGqG">caricature</a></td><td>pre-trained VToonify-T model</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1OXU95BOCCT0f6pGbwQ4yQ1EHb2LPd2yb">arcane</td></a><td>pre-trained VToonify-T model</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1KvawsOXzKgwDM3Z27sagO_KGE_Kc5GZS">comic</td></a><td>pre-trained VToonify-T model</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/19N4ddcTXhXbTEayTbrFc533EktbhOXMz">pixar</td></a><td>pre-trained VToonify-T model</td> </tr> <tr> <th colspan="2">Supporting model</th><th> </th> </tr> <tr> <td colspan="2"><a href="https://drive.google.com/file/d/1NgI4mPkboYvYw3MWcdUaQhkr0OWgs9ej/view?usp=sharing">encoder.pt</a></td><td>Pixel2style2pixel encoder to map real faces into Z+ space of StyleGAN</td> </tr> <tr> <td colspan="2"><a href="https://drive.google.com/file/d/1jY0mTjVB8njDh6e0LP_2UxuRK3MnjoIR/view">faceparsing.pth</a></td><td>BiSeNet for face parsing from <a href="https://github.com/zllrunning/face-parsing.PyTorch">face-parsing.PyTorch</a></td> </tr> </table>The downloaded models are suggested to be arranged in this folder structure.
The VToonify-D models are named with suffixes to indicate the settings, where
_sXXX: supports only one fixed style withXXXthe index of this style._swithoutXXXmeans the model supports examplar-based style transfer
_dXXX: supports only a fixed style degree ofXXX._dwithoutXXXmeans the model supports style degrees ranging from 0 to 1
_c: supports color transfer.
Style Transfer with VToonify-D
✔ A quick start HERE
Transfer a default cartoon style onto a default face image ./data/077436.jpg:
python style_transfer.py --scale_ima
Related Skills
docs-writer
99.0k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
334.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.8kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
mcp-documentation-server
300MCP Documentation Server - Bridge the AI Knowledge Gap. ✨ Features: Document management • Gemini integration • AI-powered semantic search • File uploads • Smart chunking • Multilingual support • Zero-setup 🎯 Perfect for: New frameworks • API docs • Internal guides
