StyleCrafter
[TOG 2024]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
Install / Use
/learn @GongyeLiu/StyleCrafterREADME
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
🔥🔥🔥 StyleCrafter on SDXL for stylized image generation is available! Enabling higher resolution(1024×1024) and more visually pleasing!
<div align="center"><a href='https://arxiv.org/abs/2312.00330'><img src='https://img.shields.io/badge/arXiv-2312.00330-b31b1b.svg'></a> <a href='https://gongyeliu.github.io/StyleCrafter.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://huggingface.co/spaces/liuhuohuo/StyleCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> <br> <a href='https://github.com/GongyeLiu/StyleCrafter'><img src='https://img.shields.io/badge/StyleCrafter-VideoCrafter-darkcyan'></a> <a href='https://github.com/GongyeLiu/StyleCrafter-SDXL'><img src='https://img.shields.io/badge/StyleCrafter-SDXL-darkcyan'></a>
GongyeLiu, Menghan Xia*, Yong Zhang, Haoxin Chen, Jinbo Xing, <br>Xintao Wang, Yujiu Yang*, Ying Shan <br><br> (* corresponding authors)
From Tsinghua University and Tencent AI Lab.
</div>🔆 Introduction
TL;DR: We propose StyleCrafter, a generic method that enhances pre-trained T2V models with style control, supporting Style-Guided Text-to-Image Generation and Style-Guided Text-to-Video Generation. <br>
1. ⭐⭐ Style-Guided Text-to-Video Generation.
<div align="center"> <img src=docs/showcase_1.gif> <p>Style-guided text-to-video results. Resolution: 320 x 512; Frames: 16. (Compressed)</p> </div>2. Style-Guided Text-to-Image Generation.
<div align="center"> <img src=docs/showcase_img.jpeg> <p>Style-guided text-to-image results. Resolution: 512 x 512. (Compressed)</p> </div>📝 Changelog
- [2025.02.18]: Upload the full test set.
- [2024.07.14]: 🔥🔥 Remove watermarks! We finetune temporal blocks on non-watermark data with 500 steps, download updated checkpoints here.
- [2024.06.25]: 🔥🔥 Support StyleCrafter on SDXL!
- [2023.12.08]: Release the Huggingface online demo.
- [2023.12.05]: Release the code and checkpoint.
- [2023.11.30]: Release the project page.
🧰 Models
|Base Model| Gen Type | Resolution | Checkpoint | How to run | |:---------|:---------|:--------|:--------|:--------| |VideoCrafter| Image/Video |320x512|Hugging Face| StyleCrafter on VideoCrafter |SDXL| Image |1024x1024|Hugging Face| StyleCrafter on SDXL
It takes approximately 5 seconds to generate a 512×512 image and 85 seconds to generate a 320×512 video with 16 frames using a single NVIDIA A100 (40G) GPU. A GPU with at least 16G GPU memory is required to perform the inference process.
⚙️ Setup
conda create -n stylecrafter python=3.8.5
conda activate stylecrafter
pip install -r requirements.txt
💫 Inference
- Download all checkpoints according to the instructions
- Run the commands in terminal.
# style-guided text-to-image generation
sh scripts/run_infer_image.sh
# style-guided text-to-video generation
sh scripts/run_infer_video.sh
- (Optional) Infernce on your own data according to the instructions
👨👩👧👦 Crafter Family
VideoCrafter1: Framework for high-quality text-to-video generation.
ScaleCrafter: Tuning-free method for high-resolution image/video generation.
TaleCrafter: An interactive story visualization tool that supports multiple characters.
LongerCrafter: Tuning-free method for longer high-quality video generation.
DynamiCrafter Animate open-domain still images to high-quality videos.
📢 Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.
🙏 Acknowledgements
We would like to thank AK(@_akhaliq) for the help of setting up online demo.
📭 Contact
If your have any comments or questions, feel free to contact lgy22@mails.tsinghua.edu.cn
BibTex
@article{liu2023stylecrafter,
title={StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter},
author={Liu, Gongye and Xia, Menghan and Zhang, Yong and Chen, Haoxin and Xing, Jinbo and Wang, Xintao and Yang, Yujiu and Shan, Ying},
journal={arXiv preprint arXiv:2312.00330},
year={2023}
}
Related Skills
docs-writer
98.5k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
327.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
