SkillAgentSearch skills...

AniTalker

[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"

Install / Use

/learn @X-LANCE/AniTalker
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center">

<a href="https://trendshift.io/repositories/10102" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10102" alt="X-LANCE%2FAniTalker | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

AniTalker

Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

<div align="center"> <!-- <a href='LICENSE' target="_blank"><img src='https://img.shields.io/badge/license-MIT-yellow'></a> --> <a href='https://arxiv.org/abs/2405.03121' target="_blank"><img src='https://img.shields.io/badge/arXiv-AniTalker-red'></a> <a href='https://x-lance.github.io/AniTalker/' target="_blank"><img src='https://img.shields.io/badge/Project-AniTalker-green'></a> <a href='https://huggingface.co/spaces/Delik/Anitalker'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href="https://colab.research.google.com/github/yuhanxu01/AniTalker/blob/main/AniTalker_demo.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://github.com/X-LANCE/AniTalker" target="_blank"><img src="https://img.shields.io/github/stars/X-LANCE/AniTalker"></a> </div> <br>

An updated version of the paper will be uploaded later

Overall Pipeline

</div>

Updates

Environment Installation

conda create -n anitalker python==3.9.0
conda activate anitalker
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt

Windows Tutorial (Contributed by newgenai79)

MacOS Tutorial (Contributed by airwzz999)

Model Zoo

Please download the checkpoint from URL and place them into the folder ckpts

[中文用户] For Chinese users, we recommend you visit here to download.

ckpts/
├── chinese-hubert-large
├──── config.json
├──── preprocessor_config.json
├──── pytorch_model.bin
├── stage1.ckpt
├── stage2_pose_only_mfcc.ckpt
├── stage2_full_control_mfcc.ckpt
├── stage2_audio_only_hubert.ckpt
├── stage2_pose_only_hubert.ckpt
└── stage2_full_control_hubert.ckpt

Model Description:

| Stage | Model Name | Audio-only Inference | Addtional Control Signal | | --- | --- | --- | --- | | First stage | stage1.ckpt | - | Motion Encoder & Image Renderer | | Second stage (Hubert) | stage2_audio_only_hubert.ckpt | yes | - | | Second stage (Hubert) | stage2_pose_only_hubert.ckpt | yes | Head Pose | | Second stage (Hubert) | stage2_full_control_hubert.ckpt | yes | Head Pose/Location/Scale | | Second stage (MFCC) | stage2_pose_only_mfcc.ckpt | yes | Head Pose | | Second stage (MFCC) | stage2_full_control_mfcc.ckpt | yes | Head Pose/Location/Scale |

  • stage1.ckpt is trained on a single image video dataset, aiming to learn the transfer of actions. After training, it utilizes the Motion Encoder (for extracting identity-independent motion) and Image Renderer.
  • The models starting with stage2 are trained on a video dataset with audio, and unless otherwise specified, are trained from scratch.
  • stage2_audio_only_hubert.ckpt inputs audio features as Hubert, without any control signals. Suitable for scenes with faces oriented forward, compared to controllable models, it requires less parameter adjustment to achieve satisfactory results. [We recommend starting with this model]
  • stage2_pose_only_hubert.ckpt is similar to stage2_pose_only_mfcc.ckpt, the difference being that the audio features are Hubert. Compared to the audio_only model, it includes pose control signals.
  • stage2_more_controllable_hubert.ckpt is similar to stage2_more_controllable_mfcc.ckpt, but uses Hubert for audio features.
  • stage2_pose_only_mfcc.ckpt inputs audio features as MFCC, and includes pose control signals (yaw, pitch, roll angles). [The performance of the MFCC model is poor and not recommended for use.]
  • stage2_more_controllable_mfcc.ckpt inputs audio features as MFCC, and adds control signals for face location and face scale in addition to pose.
  • chinese-hubert-large are used for extracting audio features.

Quick Guide:

  • Considering usability and model performance, we recommend using stage2_audio_only_hubert.ckpt.
  • If you need more control, please use the model with the controllable suffix. Controllable models often have better expressiveness but requiring more parameter adjustment.
  • All stage2 models can also be generated solely by audio if the control flag is disabled.

Run the demo

Explanation of Parameters for demo.py

Main Inference Scripts (Hubert, Better Result 💪) - Recommended

python ./code/demo.py \
    --infer_type 'hubert_audio_only' \
    --stage1_checkpoint_path 'ckpts/stage1.ckpt' \
    --stage2_checkpoint_path 'ckpts/stage2_audio_only_hubert.ckpt' \
    --test_image_path 'test_demos/portraits/monalisa.jpg' \
    --test_audio_path 'test_demos/audios/monalisa.wav' \
    --test_hubert_path 'test_demos/audios_hubert/monalisa.npy' \
    --result_path 'outputs/monalisa_hubert/' 

See More Hubert Cases

| One Portrait | Result | |------------|--------------------------| |<img src="test_demos/portraits/monalisa.jpg" width="200" ></img> | <img src="assets/monalisa-monalisa.gif" width="200" ></img> |

Generated Raw Video (256 * 256)

User-submitted Gallery

<table class="center"> <tr> <th width=30% style="border: none">Portrait</th> <th width=30% style="border: none">Result (256*256)</th> <th width=25% style="border: none">Result (512*512)</th> <th width=5% style="border: none">Scripts</th> </tr> <tr> <td width=30% style="border: none"> <img src="test_demos/portraits/aiface2.png" width="200" ></img> </td> <td width=30% style="border: none"> <video controls loop src="https://github.com/user-attachments/assets/1b84abb3-c553-4c5b-a969-36843b186dbe" muted="false"></video> </td> <td width=25% style="border: none"> <video controls loop src="https://github.com/user-attachments/assets/3776d05a-b23e-482c-b466-cfc12feea9eb" muted="false"></video> </td> <td width=5% style="border: none"> <a href="https://github.com/X-LANCE/AniTalker/issues/20"> Link</a> </td> </tr>

You can submmit your demo via issue.

</table>

Main Inference Scripts (MFCC, Faster 🚀) - Not Recommended

[Note] The Hubert model is our default model. For environment convenience, we provide an MFCC version, but we found that the utilization rate of the Hubert model is not high, and people still use MFCC more often. MFCC has poorer results. This goes against our original intention, so we have deprecated this model. We recommend you start testing with the hubert_audio_only model. Thanks.

[Upgrade for Early Users] Re-download the checkpoint with the Hubert model into the ckpts directory and additionally install pip install transformers==4.19.2. When the code does not detect the Hubert path, it will automatically extract it and provide extra instructions on how to resolve any errors encountered.

<details><summary>Still Show Original MFCC Scripts</summary> ``` python ./code/demo.py \ --infer_type 'mfcc_pose_only' \ --stage1_checkpoint_path 'ckpts/stage1.ckpt' \ --stage2_checkpoint_path 'ckpts/stage2_pose_only_mfcc.ckpt' \ --test_image_path 'test_demos/portraits/monalisa.jpg' \ --test_audio_path 'test_demos/audios/monalisa.wav' \ --result_path 'outputs/monalisa_mfcc/' \ --control_flag \ --seed 0 \ --pose_yaw 0.25 \ --pose_pitch 0 \ --pose_roll 0 ``` </details>

Face Super-resolution (Optional)

The purpose is to upscale the resolution from 256 to 512 and address the issue of blurry rendering.

Please install addtional environment here:

pip install facexlib
pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
pip install gfpgan

# Ignore the following warning:
# espnet 202301 requires importlib-metadata<5.0, but you have importlib-metadata 7.1.0 which is incompatible.

Then enable the option --face_sr in your scripts. The first time will download the weights of

View on GitHub
GitHub Stars1.6k
CategoryDevelopment
Updated11d ago
Forks144

Languages

Jupyter Notebook

Security Score

95/100

Audited on Mar 21, 2026

No findings