Hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Generate Convert Improve

Install / Use

/learn @fudan-generative-vision/Hallo

About this skill

Quality Score

0/100

README

<h1 align='center'>Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation</h1> <div align='center'> <a href='https://github.com/xumingw' target='_blank'>Mingwang Xu</a><sup>1*</sup>&emsp; <a href='https://github.com/crystallee-ai' target='_blank'>Hui Li</a><sup>1*</sup>&emsp; <a href='https://github.com/subazinga' target='_blank'>Qingkun Su</a><sup>1*</sup>&emsp; <a href='https://github.com/NinoNeumann' target='_blank'>Hanlin Shang</a><sup>1</sup>&emsp; <a href='https://github.com/AricGamma' target='_blank'>Liwei Zhang</a><sup>1</sup>&emsp; <a href='https://github.com/cnexah' target='_blank'>Ce Liu</a><sup>3</sup>&emsp; </div> <div align='center'> <a href='https://jingdongwang2017.github.io/' target='_blank'>Jingdong Wang</a><sup>2</sup>&emsp; <a href='https://yoyo000.github.io/' target='_blank'>Yao Yao</a><sup>4</sup>&emsp; <a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>1</sup>&emsp; </div> <div align='center'> <sup>1</sup>Fudan University&emsp; <sup>2</sup>Baidu Inc&emsp; <sup>3</sup>ETH Zurich&emsp; <sup>4</sup>Nanjing University </div> <br> <div align='center'> <a href='https://github.com/fudan-generative-vision/hallo'><img src='https://img.shields.io/github/stars/fudan-generative-vision/hallo?style=social'></a> <a href='https://fudan-generative-vision.github.io/hallo/#/'><img src='https://img.shields.io/badge/Project-HomePage-Green'></a> <a href='https://arxiv.org/pdf/2406.08801'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/fudan-generative-ai/hallo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a> <a href='https://huggingface.co/spaces/fffiloni/tts-hallo-talking-portrait'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Demo-yellow'></a> <a href='https://www.modelscope.cn/models/fudan-generative-vision/Hallo/summary'><img src='https://img.shields.io/badge/Modelscope-Model-purple'></a> <a href='assets/wechat.jpeg'><img src='https://badges.aleen42.com/src/wechat.svg'></a> </div> <br>

📸 Showcase

https://github.com/fudan-generative-vision/hallo/assets/17402682/9d1a0de4-3470-4d38-9e4f-412f517f834c

🎬 Honoring Classic Films

<table class="center"> <tr> <td style="text-align: center"><b>Devil Wears Prada</b></td> <td style="text-align: center"><b>Green Book</b></td> <td style="text-align: center"><b>Infernal Affairs</b></td> </tr> <tr> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Devil_Wears_Prada-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Devil_Wears_Prada_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Green_Book-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Green_Book_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/无间道-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Infernal_Affairs_GIF.gif"></a></td> </tr> <tr> <td style="text-align: center"><b>Patch Adams</b></td> <td style="text-align: center"><b>Tough Love</b></td> <td style="text-align: center"><b>Shawshank Redemption</b></td> </tr> <tr> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Patch_Adams-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Patch_Adams_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Tough_Love-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Tough_Love_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Shawshank-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Shawshank_GIF.gif"></a></td> </tr> </table>

Explore more examples.

📰 News

2024/06/28: 🎉🎉🎉 We are proud to announce the release of our model training code. Try your own training data. Here is tutorial.
2024/06/21: 🚀🚀🚀 Cloned a Gradio demo on 🤗Huggingface space.
2024/06/20: 🌟🌟🌟 Received numerous contributions from the community, including a Windows version, ComfyUI, WebUI, and Docker template.
2024/06/15: ✨✨✨ Released some images and audios for inference testing on 🤗Huggingface.
2024/06/15: 🎉🎉🎉 Launched the first version on 🫡GitHub.

🤝 Community Resources

Explore the resources developed by our community to enhance your experience with Hallo:

TTS x Hallo Talking Portrait Generator - Check out this awesome Gradio demo by @Sylvain Filoni! With this tool, you can conveniently prepare portrait image and audio for Hallo.
Demo on Huggingface - Check out this easy-to-use Gradio demo by @multimodalart.
hallo-webui - Explore the WebUI created by @daswer123.
hallo-for-windows - Utilize Hallo on Windows with the guide by @sdbds.
ComfyUI-Hallo - Integrate Hallo with the ComfyUI tool by @AIFSH.
hallo-docker - Docker image for Hallo by @ashleykleynhans.
RunPod Template - Deploy Hallo to RunPod by @ashleykleynhans.
JoyHallo - JoyHallo extends the capabilities of Hallo, enabling it to support Mandarin

Thanks to all of them.

Join our community and explore these amazing resources to make the most out of Hallo. Enjoy and elevate their creative projects!

🔧️ Framework

abstract framework

⚙️ Installation

System requirement: Ubuntu 20.04/Ubuntu 22.04, Cuda 12.1
Tested GPUs: A100

Create conda environment:

  conda create -n hallo python=3.10
  conda activate hallo

Install packages with pip

  pip install -r requirements.txt
  pip install .

Besides, ffmpeg is also needed:

  apt-get install ffmpeg

🗝️️ Usage

The entry point for inference is scripts/inference.py. Before testing your cases, two preparations need to be completed:

Download all required pretrained models.
Prepare source image and driving audio pairs.
Run inference.

📥 Download Pretrained Models

You can easily get all pretrained models required by inference from our HuggingFace repo.

Clone the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:

git lfs install
git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models

Or you can download them separately from their source repo:

hallo: Our checkpoints consist of denoising UNet, face locator, image & audio proj.
audio_separator: Kim_Vocal_2 MDX-Net vocal removal model. (Thanks to KimberleyJensen)
insightface: 2D and 3D Face Analysis placed into pretrained_models/face_analysis/models/. (Thanks to deepinsight)
face landmarker: Face detection & mesh model from mediapipe placed into pretrained_models/face_analysis/models.
motion module: motion module from AnimateDiff. (Thanks to guoyww).
sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
wav2vec: wav audio to vector model from Facebook.

Finally, these pretrained models should be organized as follows:

./pretrained_models/
|-- audio_separator/
|   |-- download_checks.json
|   |-- mdx_model_data.json
|   |-- vr_model_data.json
|   `-- Kim_Vocal_2.onnx
|-- face_analysis/
|   `-- models/
|       |-- face_landmarker_v2_with_blendshapes.task  # face landmarker model from mediapipe
|       |-- 1k3d68.onnx
|       |-- 2d106det.onnx
|       |-- genderage.onnx
|       |-- glintr100.onnx
|       `-- scrfd_10g_bnkps.onnx
|-- motion_module/
|   `-- mm_sd_v15_v2.ckpt

Related Skills

docs-writer

98.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

328.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

fudan-generative-vision

View profile

View on GitHub

GitHub Stars8.7k

CategoryContent

Updated3d ago

Forks1.1k

fudan-generative-vision/hallo

Languages

Python

Security Score

100/100

Audited on Mar 18, 2026

No findings