Styletalk

No description available

Generate Convert Improve

Install / Use

/learn @FuxiVirtualHuman/Styletalk

About this skill

Quality Score

0/100

README

StyleTalk

The official repository of the AAAI2023 paper StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

<a href="https://arxiv.org/abs/2301.01081">Paper</a> | <a href="https://drive.google.com/file/d/19WRhBHYVWRIH8_zo332l00fLXfUE96-k/view?usp=share_link">Supp. Materials</a> | <a href="https://youtu.be/mO2Tjcwr4u8">Video</a> <img src='media/first_page.png' width='700'/>

The proposed StyleTalk can generate talking head videos with speaking styles specified by arbitrary style reference videos.

News

April 14th, 2023. The code is available.

Get Started

Installation

Clone this repo, install conda and run:

conda create -n styletalk python=3.7.0
conda activate styletalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg

The code has been test on CUDA 11.1, GPU RTX 3090.

Data Preprocessing

Our methods takes 3DMM parameters(*.mat) and phoneme labels(*_seq.json) as input. Follow PIRenderer to extract 3DMM parameters. Follow AVCT to extract phoneme labels. Some preprocessed data can be found in folder samples.

Inference

Download checkpoints for StyleTalk and Renderer and put them into ./checkpoints.

Run the demo:

python inference_for_demo.py \
--audio_path samples/source_video/phoneme/reagan_clip1_seq.json \
--style_clip_path samples/style_clips/3DMM/happyenglish_clip1.mat \
--pose_path samples/source_video/3DMM/reagan_clip1.mat \
--src_img_path samples/source_video/image/andrew_clip_1.png \
--wav_path samples/source_video/wav/reagan_clip1.wav \
--output_path demo.mp4

Change audio_path, style_clip_path, pose_path, src_img_path, wav_path, output_path to generate more results.

Acknowledgement

Some code are borrowed from following projects:

Thanks for their contributions!

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。