SkillAgentSearch skills...

Sonic

Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"

Install / Use

/learn @jixiaozhong/Sonic
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Sonic

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation, CVPR 2025.

<a href='https://jixiaozhong.github.io/Sonic/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href="http://demo.sonic.jixiaozhong.online/" style="margin: 0 2px;"> <img src='https://img.shields.io/badge/Demo-Gradio-gold?style=flat&logo=Gradio&logoColor=red' alt='Demo'> </a> <a href='https://openaccess.thecvf.com/content/CVPR2025/papers/Ji_Sonic_Shifting_Focus_to_Global_Audio_Perception_in_Portrait_Animation_CVPR_2025_paper.pdf'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href="https://huggingface.co/spaces/xiaozhongji/Sonic" style="margin: 0 2px;"> <img src='https://img.shields.io/badge/Space-ZeroGPU-orange?style=flat&logo=Gradio&logoColor=red' alt='Demo'> </a> <a href="https://raw.githubusercontent.com/jixiaozhong/Sonic/refs/heads/main/LICENSE" style="margin: 0 2px;"> <img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'> </a>

<p align="center"> 👋 Join our <a href="examples/image/QQ2.jpg" target="_blank">QQ Chat Group</a> </p> <p align="center">

🔥🔥🔥 NEWS

2025/05/06: We have open-sourced ​​DICE-Talk​​, a portrait-driven system with emotional expression. Welcome to try it out!

2025/03/14: Super stoked to share that our Sonic is accpted by the CVPR 2025! See you Nashville!!

2025/02/08: Many thanks to the open-source community contributors for making the ComfyUI version of Sonic a reality. Your efforts are truly appreciated! ComfyUI version of Sonic

2025/02/06: Commercialization: Note that our license is non-commercial. If commercialization is required, please use Tencent Cloud Video Creation Large Model: Introduction / API documentation

2025/01/17: Our Online huggingface Demo is released.

2025/01/17: Thank you to NewGenAI for promoting our Sonic and creating a Windows-based tutorial on YouTube.

2024/12/16: Our Online Demo is released.

🎥 Demo

| Input | Output | Input | Output | |----------------------|-----------------------|----------------------|-----------------------| |<img src="examples/image/anime1.png" width="360">|<video src="https://github.com/user-attachments/assets/636c3ff5-210e-44b8-b901-acf828071133" width="360"> </video>|<img src="examples/image/female_diaosu.png" width="360">|<video src="https://github.com/user-attachments/assets/e8207300-2569-47d1-9ad4-4b4c9b0f0bd4" width="360"> </video>| |<img src="examples/image/hair.png" width="360">|<video src="https://github.com/user-attachments/assets/dcb755c1-de01-4afe-8b4f-0e0b2c2439c1" width="360"> </video>|<img src="examples/image/leonnado.jpg" width="360">|<video src="https://github.com/user-attachments/assets/b50e61bb-62d4-469d-b402-b37cda3fbd27" width="360"> </video>|

For more visual demos, please visit our Page.

🧩 Community Contributions

If you develop/use Sonic in your projects, welcome to let us know.

📑 Updates

2025/01/14: Our inference code and weights are released. Stay tuned, we will continue to polish the model.

📜 Requirements

  • An NVIDIA GPU with CUDA support is required.
    • The model is tested on a single 32G GPU.
  • Tested operating system: Linux

🔑 Inference

Installtion

  • install pytorch
  pip3 install -r requirements.txt
  • All models are stored in checkpoints by default, and the file structure is as follows
Sonic
  ├──checkpoints
  │  ├──Sonic
  │  │  ├──audio2bucket.pth
  │  │  ├──audio2token.pth
  │  │  ├──unet.pth
  │  ├──stable-video-diffusion-img2vid-xt
  │  │  ├──...
  │  ├──whisper-tiny
  │  │  ├──...
  │  ├──RIFE
  │  │  ├──flownet.pkl
  │  ├──yoloface_v5m.pt
  ├──...

Download by huggingface-cli follow

  python3 -m pip install "huggingface_hub[cli]"
  huggingface-cli download LeonJoe13/Sonic --local-dir  checkpoints
  huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt --local-dir  checkpoints/stable-video-diffusion-img2vid-xt
  huggingface-cli download openai/whisper-tiny --local-dir checkpoints/whisper-tiny

or manully download pretrain model, svd-xt and whisper-tiny to checkpoints/

Run demo

  python3 demo.py \
  '/path/to/input_image' \
  '/path/to/input_audio' \
  '/path/to/output_video'

🔗 Citation

If you find our work helpful for your research, please consider citing our work.

@inproceedings{ji2025sonic,
  title={Sonic: Shifting focus to global audio perception in portrait animation},
  author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={193--203},
  year={2025}
}

@article{ji2024realtalk,
  title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},
  author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},
  journal={arXiv preprint arXiv:2406.18284},
  year={2024}
}

@article{tan2025dicetalk,
  title={Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation}, 
  author={Tan, Weipeng and Lin, Chuming and Xu, Chengming and Xu, FeiFan and Hu, Xiaobin and Ji, Xiaozhong and Zhu, Junwei and Wang, Chengjie and Fu, Yanwei},
  journal={arXiv preprint arXiv:2504.18087},
  year={2025}
}

📜 Related Works

Explore our related researches:

📈 Star History

Star History Chart

View on GitHub
GitHub Stars3.2k
CategoryDevelopment
Updated1h ago
Forks285

Languages

Python

Security Score

80/100

Audited on Mar 29, 2026

No findings