SadTalker
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Install / Use
/learn @OpenTalker/SadTalkerREADME
<a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> <a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<br>

<b>TL;DR: single portrait image 🙎♂️ + audio 🎤 = talking head video 🎞.</b>
<br> </div>Highlights
-
The license has been updated to Apache 2.0, and we've removed the non-commercial restriction
-
SadTalker has now officially been integrated into Discord, where you can use it for free by sending files. You can also generate high-quailty videos from text prompts. Join:
-
We've published a stable-diffusion-webui extension. Check out more details here. Demo Video
-
Full image mode is now available! More details...
| still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | input image @bagbag1815 | |:--------------------: |:--------------------: | :----: | | <video src="https://user-images.githubusercontent.com/48216707/229484996-5d7be64f-2553-4c9e-a452-c5cf0b8ebafe.mp4" type="video/mp4"> </video> | <video src="https://user-images.githubusercontent.com/4397546/230717873-355b7bf3-d3de-49f9-a439-9220e623fce7.mp4" type="video/mp4"> </video> | <img src='./examples/source_image/full_body_2.png' width='380'>
-
Several new modes (Still, reference, and resize modes) are now available!
-
We're happy to see more community demos on bilibili, YouTube and X (#sadtalker).
Changelog
The previous changelog can be found here.
-
[2023.06.12]: Added more new features in WebUI extension, see the discussion here.
-
[2023.06.05]: Released a new 512x512px (beta) face model. Fixed some bugs and improve the performance.
-
[2023.04.15]: Added a WebUI Colab notebook by @camenduru:
-
[2023.04.12]: Added a more detailed WebUI installation document and fixed a problem when reinstalling.
-
[2023.04.12]: Fixed the WebUI safe issues becasue of 3rd-party packages, and optimized the output path in
sd-webui-extension. -
[2023.04.08]: In v0.0.2, we added a logo watermark to the generated video to prevent abuse. This watermark has since been removed in a later release.
-
[2023.04.08]: In v0.0.2, we added features for full image animation and a link to download checkpoints from Baidu. We also optimized the enhancer logic.
To-Do
We're tracking new updates in issue #280.
Troubleshooting
If you have any problems, please read our FAQs before opening an issue.
1. Installation.
Community tutorials: 中文Windows教程 (Chinese Windows tutorial) | 日本語コース (Japanese tutorial).
Linux/Unix
-
Install Anaconda, Python and
git. -
Creating the env and install the requirements.
git clone https://github.com/OpenTalker/SadTalker.git
cd SadTalker
conda create -n sadtalker python=3.8
conda activate sadtalker
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
### Coqui TTS is optional for gradio demo.
### pip install TTS
Windows
A video tutorial in chinese is available here. You can also follow the following instructions:
- Install Python 3.8 and check "Add Python to PATH".
- Install git manually or using Scoop:
scoop install git. - Install
ffmpeg, following this tutorial or using scoop:scoop install ffmpeg. - Download the SadTalker repository by running
git clone https://github.com/Winfredy/SadTalker.git. - Download the checkpoints and gfpgan models in the downloads section.
- Run
start.batfrom Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.
macOS
A tutorial on installing SadTalker on macOS can be found here.
Docker, WSL, etc
Please check out additional tutorials here.
2. Download Models
You can run the following script on Linux/macOS to automatically download all the models:
bash scripts/download_models.sh
We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.
Pre-Trained Models
- Google Drive
- GitHub Releases
- Baidu (百度云盘) (Password:
sadt)
GFPGAN Offline Patch
- Google Drive
- GitHub Releases
- Baidu (百度云盘) (Password:
sadt)
Model explains:
New version
| Model | Description
| :--- | :----------
|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render).
|checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render).
|gfpgan/weights | Face detection and enhanced models used in facexlib and gfpgan.
Old version
| Model | Description | :--- | :---------- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from the reappearance of face-vid2vid. |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in Deep3DFaceReconstruction. |checkpoints/wav2lip.pth | Highly accurate lip-sync model in Wav2lip. |checkpoints/shape_predictor_68_face_landmarks.dat | Fa
