AnimateLCM
[SIGGRAPH ASIA 2024 TCS] AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
Install / Use
/learn @G-U-N/AnimateLCMREADME
⚡️AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
[Paper] [Project Page ✨] [Demo in 🤗Hugging Face] [Pre-trained Models] [Civitai]
by Fu-Yun Wang, Zhaoyang Huang📮, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li📮
</div>| Example 1 | Example 2 | Example 3 |
|-----------------|-----------------|-----------------|
|
|
|
|
If you use any components of our work, please cite it.
@article{wang2024animatelcm,
title={AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning},
author={Wang, Fu-Yun and Huang, Zhaoyang and Shi, Xiaoyu and Bian, Weikang and Song, Guanglu and Liu, Yu and Li, Hongsheng},
journal={arXiv preprint arXiv:2402.00769},
year={2024}
}
News
- [2024.05]: 🔥🔥🔥 We release the training script for accelerating Stable Video Diffusion.
- [2024.03]: 😆😆😆 We release the AnimateLCM-I2V and AnimateLCM-SVD for fast image animation.
- [2024.02]: 🤗🤗🤗 Release pretrained model weights and Huggingface Demo.
- [2024.02]: 💡💡💡 Technical report is available on arXiv.
Here is a screen recording of usage. Prompt:"river reflecting mountain"
Introduction
Animate-LCM is a pioneer work and exploratory on fast animation generation following the consistency models, being able to generate animations in good quality with 4 inference steps.
It relies on the decoupled learning paradigm, firstly learning image generation prior and then learning the temporal generation prior for fast sampling, greatly boosting the training efficiency.
The High-level workflow of AnimateLCM can be
<div align="center"> <img src="__assets__/imgs/demo_figure.png" alt="comparison" style="zoom:80%;" /> </div>Demos
We have launched lots of demo videos generated by Animate-LCM on the Project Page. Generally speaking, AnimateLCM works for fast, text-to-video, control-to-video, image-to-video, video-to-video stylization, and longer video generation.
<div align="center"> <img src="__assets__/imgs/examples.png" alt="comparison" style="zoom:80%;" /> </div>Models
So far, we have released three models for usage
-
Animate-LCM-T2V: A spatial LoRA weight and a motion module for personalized video generation. Some trying from the community point out that the motion module is also compatible with many personalized models tuned for LCM, for example Dreamshaper-LCM.
-
AnimateLCM-SVD-xt. I provide AnimateLCM-SVD-xt and AnimateLCM-SVD-xt 1.1, which are tuned from SVD-xt and SVD-xt 1.1 respectively. They work for high-resolution image animation with 25 frames with 1~8 steps. You can try it with the Hugging Face Demo. Thanks to the Hugging Face team for providing the GPU grants.
-
AnimateLCM-I2V. A spatial LoRA weight and a motion module with an additional image encoder for personalized image animation. It is our trying to directly train an image animation model for fast sampling without any teacher models. It can generate animations with a personalized image with 2~4 steps. Yet due to the training resources is very limited, it is not as stable as I would like (Just like most I2V models built on Stable-Diffusion-v1-5, they generally not very stable for generation).
Install & Usage Instruction
We split the animatelcm_sd15 and animatelcm_svd into two folders. They are based on different environments. Please refer to README_animatelcm_sd15 and README_animatelcm_svd for instructions.
Usage Tips
-
AnimateLCM-T2V:
- 4 steps can generally work well. For better quality, apply 6~8 inference steps to improve the generation quality.
- CFG scale should be set between 1~2. Set CFG=1 can reduce the sampling cost by half. However, generally, I would prefer using CFG 1.5 and setting proper negative prompts for sampling to achieve better quality.
- Set the video length to 16 frames for sampling. This is the length that the model trained with.
- The models should work with IP-Adapter, ControlNet, and lots of adapters tuned for Stable Diffusion in a zero-shot manner. If you hope for better results of combination, you can try to tune them together by applying the teacher-free adaptation script I provide. It will not corrupt the sampling speed.
-
AnimateLCM-I2V:
-
2-4 steps should work for personalized image animation.
-
In most cases, the model does not need CFG values. Just set the CFG=1 to reduce inference cost.
-
I additionally set a
motion scalehyper-parameter. Set it to 0.8 as the default choice. If you set it to 0.0, you should always obtain static animations. You can increase the motion scale for larger motions, but that will sometimes cause generation failure. -
The typical workflow can be:
- Using your personalized image models to generate an image with good quality.
- Applying the generated image as input and reusing the same prompt for image animation.
- You can even further apply AnimateLCM-T2V to refine the final motion quality.
-
-
AnimateLCM-SVD:
- 1-4 steps should work.
- SVD requires two CFG values.
CFG_minandCFG_max. By default,CFG_minis set to 1. Slightly adjustingCFG_maxbetween [1, 1.5] will obtain good results. Again, just setting it to 1 to reduce the inference cost. - For other hyper-parameters of AnimateLCM-SVD-xt, please just follow the original SVD design.
Related Notes
- 🎉 Tutorial video of AnimateLCM on ComfyUI: Tutorial Video
- 🎉 ComfyUI for AnimateLCM: AnimateLCM-ComfyUI & ComfyUI-Reddit
Comparison
Screen recording of AnimateLCM-T2V. Prompt: "dog with sunglasses".
Contact & Collaboration
I am open to collaboration, but not to a full-time intern. If you find some of my work interesting and hope for collaboration/discussion in any format, please do not hesitate to contact me.
📧 Email: fywang@link.cuhk.edu.hk
Acknowledge
I would thank AK for broadcasting our work and the hugging face team for providing help in building the gradio demo and storing the models. Would thank the Dhruv Nair for providing help in diffusers.
Related Skills
docs-writer
99.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
339.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
project-overview
FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
