AudioDVP
AudioDVP:Photorealistic Audio-driven Video Portraits
Install / Use
/learn @xinwen-cs/AudioDVPREADME
AudioDVP
This is the official implementation of Photorealistic Audio-driven Video Portraits.
Major Requirements
- Ubuntu >= 18.04
- PyTorch >= 1.2
- GCC >= 7.5
- NVCC >= 10.1
- FFmpeg (with H.264 support)
FYI, detailed environment setup is in enviroment.yml. (You definitely don't have to install all of them, just install what you need when you encounter an import error.)
Major implementation differences against original paper
-
Geometry parameter and texture parameter of 3DMM is now initialized from zero and shared among all samples during fitting, since it is more reasonable.
-
Using OpenCV rather than PIL for image editing operation.
Usage
1. Download face model data
-
Download Basel Face Model 2009. (Register and get 01_MorphableModel.mat.)
-
Download expression basis from 3DFace. (There is an Exp_Pca.bin in CoarseData.)
-
Download auxiliary files from Deep3DFaceReconstruction.
-
Put the data in
renderer/datalike the structure below.renderer/data ├── 01_MorphableModel.mat ├── Exp_Pca.bin ├── BFM_front_idx.mat ├── BFM_exp_idx.mat ├── facemodel_info.mat ├── select_vertex_id.mat ├── std_exp.txt └── data.mat(This is generated by the step 2 below.)
2. Build data
cd renderer/
python build_data.py
3.Download pretrained model of ATnet
- The link is here.
- Put
atnet_lstm_18.pthinvendor/ATVGnet/model.
4.Download pretrained ResNet on VGGFace2
- The link is here.
- Put
resnet50_ft_weight.pklinweights
5.Download Trump speech video
- The link is here. (Video courtesy of The White House.)
- Put it in
data/video
6.Compile CUDA rasterizer kernel
cd renderer/kernels
python setup.py build_ext --inplace
7.Running demo script
# Explanation of every step is provided.
./scripts/demo.sh
Since we provide both training and inference code, we won't upload pretrained model for brevity at present.
We provide expected result in data/sample_result.mp4 using synthesized audio in data/test_audio.
Acknowledgment
This work is build upon many great open source code and data.
-
Many implementation details are learned from Deep3DFaceReconstruction.
-
ATVGnet in the vendor directory is directly borrowed from ATVGnet under MIT License.
-
neural-face-renderer in the vendor directory is heavily borrowed from CycleGAN and pix2pix in PyTorch under BSD License.
-
The pre-trained ResNet model on VGGFace2 dataset is from VGGFace2-pytorch under MIT License.
-
Basel2009 3D face dataset is from here.
-
The expression basis of 3DMM is from 3DFace under GPL License.
-
Our renderer is heavily borrowed from tf_mesh_renderer and inspired by pytorch_mesh_renderer.
Notification
- Our method is built upon Deep Video Portraits.
- Our method adopts a person-specific Audio2Expression module, which is not robust enough than a universal one trained on large dataset such as Lip Reading Sentences in the Wild. A universal one is encouraged! Fortunately, our method works quite well on WaveNet sythesized audio like provided in
data/test_audio. - The code IS NOT fully tested on another clean machine.
- There is a known bug in the rasterizer that several pixels of rendered face are black (not assigned with any color) in some corner conditions due to float point error which I can't fix.
Disclaimer
We made this code publicly available to benefit graphics and vision community. Please DO NOT abuse the code for devil things.
Citation
@article{wen2020audiodvp,
author={Xin Wen and Miao Wang and Christian Richardt and Ze-Yin Chen and Shi-Min Hu},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={Photorealistic Audio-driven Video Portraits},
year={2020},
volume={26},
number={12},
pages={3457-3466},
doi={10.1109/TVCG.2020.3023573}
}
License
BSD
Related Skills
qqbot-channel
349.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
