Hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Install / Use
/learn @fudan-generative-vision/HalloREADME
📸 Showcase
https://github.com/fudan-generative-vision/hallo/assets/17402682/9d1a0de4-3470-4d38-9e4f-412f517f834c
🎬 Honoring Classic Films
<table class="center"> <tr> <td style="text-align: center"><b>Devil Wears Prada</b></td> <td style="text-align: center"><b>Green Book</b></td> <td style="text-align: center"><b>Infernal Affairs</b></td> </tr> <tr> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Devil_Wears_Prada-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Devil_Wears_Prada_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Green_Book-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Green_Book_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/无间道-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Infernal_Affairs_GIF.gif"></a></td> </tr> <tr> <td style="text-align: center"><b>Patch Adams</b></td> <td style="text-align: center"><b>Tough Love</b></td> <td style="text-align: center"><b>Shawshank Redemption</b></td> </tr> <tr> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Patch_Adams-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Patch_Adams_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Tough_Love-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Tough_Love_GIF.gif"></a></td> <td style="text-align: center"><a target="_blank" href="https://cdn.aondata.work/video/short_movie/Shawshank-480p.mp4"><img src="https://cdn.aondata.work/img/short_movie/Shawshank_GIF.gif"></a></td> </tr> </table>Explore more examples.
📰 News
2024/06/28: 🎉🎉🎉 We are proud to announce the release of our model training code. Try your own training data. Here is tutorial.2024/06/21: 🚀🚀🚀 Cloned a Gradio demo on 🤗Huggingface space.2024/06/20: 🌟🌟🌟 Received numerous contributions from the community, including a Windows version, ComfyUI, WebUI, and Docker template.2024/06/15: ✨✨✨ Released some images and audios for inference testing on 🤗Huggingface.2024/06/15: 🎉🎉🎉 Launched the first version on 🫡GitHub.
🤝 Community Resources
Explore the resources developed by our community to enhance your experience with Hallo:
- TTS x Hallo Talking Portrait Generator - Check out this awesome Gradio demo by @Sylvain Filoni! With this tool, you can conveniently prepare portrait image and audio for Hallo.
- Demo on Huggingface - Check out this easy-to-use Gradio demo by @multimodalart.
- hallo-webui - Explore the WebUI created by @daswer123.
- hallo-for-windows - Utilize Hallo on Windows with the guide by @sdbds.
- ComfyUI-Hallo - Integrate Hallo with the ComfyUI tool by @AIFSH.
- hallo-docker - Docker image for Hallo by @ashleykleynhans.
- RunPod Template - Deploy Hallo to RunPod by @ashleykleynhans.
- JoyHallo - JoyHallo extends the capabilities of Hallo, enabling it to support Mandarin
Thanks to all of them.
Join our community and explore these amazing resources to make the most out of Hallo. Enjoy and elevate their creative projects!
🔧️ Framework

⚙️ Installation
- System requirement: Ubuntu 20.04/Ubuntu 22.04, Cuda 12.1
- Tested GPUs: A100
Create conda environment:
conda create -n hallo python=3.10
conda activate hallo
Install packages with pip
pip install -r requirements.txt
pip install .
Besides, ffmpeg is also needed:
apt-get install ffmpeg
🗝️️ Usage
The entry point for inference is scripts/inference.py. Before testing your cases, two preparations need to be completed:
- Download all required pretrained models.
- Prepare source image and driving audio pairs.
- Run inference.
📥 Download Pretrained Models
You can easily get all pretrained models required by inference from our HuggingFace repo.
Clone the pretrained models into ${PROJECT_ROOT}/pretrained_models directory by cmd below:
git lfs install
git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models
Or you can download them separately from their source repo:
- hallo: Our checkpoints consist of denoising UNet, face locator, image & audio proj.
- audio_separator: Kim_Vocal_2 MDX-Net vocal removal model. (Thanks to KimberleyJensen)
- insightface: 2D and 3D Face Analysis placed into
pretrained_models/face_analysis/models/. (Thanks to deepinsight) - face landmarker: Face detection & mesh model from mediapipe placed into
pretrained_models/face_analysis/models. - motion module: motion module from AnimateDiff. (Thanks to guoyww).
- sd-vae-ft-mse: Weights are intended to be used with the diffusers library. (Thanks to stablilityai)
- StableDiffusion V1.5: Initialized and fine-tuned from Stable-Diffusion-v1-2. (Thanks to runwayml)
- wav2vec: wav audio to vector model from Facebook.
Finally, these pretrained models should be organized as follows:
./pretrained_models/
|-- audio_separator/
| |-- download_checks.json
| |-- mdx_model_data.json
| |-- vr_model_data.json
| `-- Kim_Vocal_2.onnx
|-- face_analysis/
| `-- models/
| |-- face_landmarker_v2_with_blendshapes.task # face landmarker model from mediapipe
| |-- 1k3d68.onnx
| |-- 2d106det.onnx
| |-- genderage.onnx
| |-- glintr100.onnx
| `-- scrfd_10g_bnkps.onnx
|-- motion_module/
| `-- mm_sd_v15_v2.ckpt
Related Skills
docs-writer
98.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
328.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
arscontexta
2.8kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
