SkillAgentSearch skills...

SadTalker

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Install / Use

/learn @OpenTalker/SadTalker

README

<div align="center"> <img src='https://user-images.githubusercontent.com/4397546/229094115-862c747e-7397-4b54-ba4a-bd368bfe2e0f.png' width='500px'/> <!--<h2> 😭 SadTalker: <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2> -->

<a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a>   <a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a>   Open In Colab   Hugging Face Spaces   sd webui-colab   <br> Replicate Discord

<div> <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp; <a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp; <a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a>&emsp; <a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp; <a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>&emsp; </br> <a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a>&emsp; <a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a>&emsp; <a target='_blank'>Fei Wang <sup>1</sup> </a>&emsp; </div> <br> <div> <sup>1</sup> Xi'an Jiaotong University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Ant Group &emsp; </div> <br> <i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i> <br> <br>

sadtalker

<b>TL;DR:       single portrait image 🙎‍♂️      +       audio 🎤       =       talking head video 🎞.</b>

<br> </div>

Highlights

  • The license has been updated to Apache 2.0, and we've removed the non-commercial restriction

  • SadTalker has now officially been integrated into Discord, where you can use it for free by sending files. You can also generate high-quailty videos from text prompts. Join: Discord

  • We've published a stable-diffusion-webui extension. Check out more details here. Demo Video

  • Full image mode is now available! More details...

| still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | input image @bagbag1815 | |:--------------------: |:--------------------: | :----: | | <video src="https://user-images.githubusercontent.com/48216707/229484996-5d7be64f-2553-4c9e-a452-c5cf0b8ebafe.mp4" type="video/mp4"> </video> | <video src="https://user-images.githubusercontent.com/4397546/230717873-355b7bf3-d3de-49f9-a439-9220e623fce7.mp4" type="video/mp4"> </video> | <img src='./examples/source_image/full_body_2.png' width='380'>

  • Several new modes (Still, reference, and resize modes) are now available!

  • We're happy to see more community demos on bilibili, YouTube and X (#sadtalker).

Changelog

The previous changelog can be found here.

  • [2023.06.12]: Added more new features in WebUI extension, see the discussion here.

  • [2023.06.05]: Released a new 512x512px (beta) face model. Fixed some bugs and improve the performance.

  • [2023.04.15]: Added a WebUI Colab notebook by @camenduru: sd webui-colab

  • [2023.04.12]: Added a more detailed WebUI installation document and fixed a problem when reinstalling.

  • [2023.04.12]: Fixed the WebUI safe issues becasue of 3rd-party packages, and optimized the output path in sd-webui-extension.

  • [2023.04.08]: In v0.0.2, we added a logo watermark to the generated video to prevent abuse. This watermark has since been removed in a later release.

  • [2023.04.08]: In v0.0.2, we added features for full image animation and a link to download checkpoints from Baidu. We also optimized the enhancer logic.

To-Do

We're tracking new updates in issue #280.

Troubleshooting

If you have any problems, please read our FAQs before opening an issue.

1. Installation.

Community tutorials: 中文Windows教程 (Chinese Windows tutorial) | 日本語コース (Japanese tutorial).

Linux/Unix

  1. Install Anaconda, Python and git.

  2. Creating the env and install the requirements.

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### Coqui TTS is optional for gradio demo. 
### pip install TTS

Windows

A video tutorial in chinese is available here. You can also follow the following instructions:

  1. Install Python 3.8 and check "Add Python to PATH".
  2. Install git manually or using Scoop: scoop install git.
  3. Install ffmpeg, following this tutorial or using scoop: scoop install ffmpeg.
  4. Download the SadTalker repository by running git clone https://github.com/Winfredy/SadTalker.git.
  5. Download the checkpoints and gfpgan models in the downloads section.
  6. Run start.bat from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.

macOS

A tutorial on installing SadTalker on macOS can be found here.

Docker, WSL, etc

Please check out additional tutorials here.

2. Download Models

You can run the following script on Linux/macOS to automatically download all the models:

bash scripts/download_models.sh

We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.

Pre-Trained Models

<!-- TODO add Hugging Face links -->

GFPGAN Offline Patch

<!-- TODO add Hugging Face links --> <details><summary>Model Details</summary>

Model explains:

New version

| Model | Description | :--- | :---------- |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render). |checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render). |gfpgan/weights | Face detection and enhanced models used in facexlib and gfpgan.

Old version

| Model | Description | :--- | :---------- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from the reappearance of face-vid2vid. |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in Deep3DFaceReconstruction. |checkpoints/wav2lip.pth | Highly accurate lip-sync model in Wav2lip. |checkpoints/shape_predictor_68_face_landmarks.dat | Fa

View on GitHub
GitHub Stars13.7k
CategoryEducation
Updated8h ago
Forks2.6k

Languages

Python

Security Score

85/100

Audited on Mar 22, 2026

No findings