Styletalk
No description available
Install / Use
/learn @FuxiVirtualHuman/StyletalkREADME
StyleTalk
The official repository of the AAAI2023 paper StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles
<p align='center'> <b> <a href="https://arxiv.org/abs/2301.01081">Paper</a> | <a href="https://drive.google.com/file/d/19WRhBHYVWRIH8_zo332l00fLXfUE96-k/view?usp=share_link">Supp. Materials</a> | <a href="https://youtu.be/mO2Tjcwr4u8">Video</a> </b> </p> <p align='center'> <img src='media/first_page.png' width='700'/> </p>The proposed StyleTalk can generate talking head videos with speaking styles specified by arbitrary style reference videos.
News
- April 14th, 2023. The code is available.
Get Started
Installation
Clone this repo, install conda and run:
conda create -n styletalk python=3.7.0
conda activate styletalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg
The code has been test on CUDA 11.1, GPU RTX 3090.
Data Preprocessing
Our methods takes 3DMM parameters(*.mat) and phoneme labels(*_seq.json) as input. Follow PIRenderer to extract 3DMM parameters. Follow AVCT to extract phoneme labels. Some preprocessed data can be found in folder samples.
Inference
Download checkpoints for StyleTalk and Renderer and put them into ./checkpoints.
Run the demo:
python inference_for_demo.py \
--audio_path samples/source_video/phoneme/reagan_clip1_seq.json \
--style_clip_path samples/style_clips/3DMM/happyenglish_clip1.mat \
--pose_path samples/source_video/3DMM/reagan_clip1.mat \
--src_img_path samples/source_video/image/andrew_clip_1.png \
--wav_path samples/source_video/wav/reagan_clip1.wav \
--output_path demo.mp4
Change audio_path, style_clip_path, pose_path, src_img_path, wav_path, output_path to generate more results.
Acknowledgement
Some code are borrowed from following projects:
Thanks for their contributions!
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
