SCAIL
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
Install / Use
/learn @zai-org/SCAILREADME
This repository contains the official implementation code for SCAIL (Studio-Grade Character Animation via In-Context Learning), a framework that enables high-fidelity character animation under diverse and challenging conditions, including large motion variations, stylized characters, and multi-character interactions.
<p align="center"> <img src='resources/teaser.png' alt='Teaser' width='90%'> </p>🔎 Motivation and Results
SCAIL identifies the key bottlenecks that hinder character animation towards production level: limited generalization towards characters and incoherent motion under complex scenarios (e.g., the long-standing challenge of multi-character interactions, as well as common failures in basic motions like flipping and turning). We revisit the core components of character animation -- how to represent the pose condition and how to inject the pose condition. Our framework resolves the challenge that pose representations cannot simultaneously prevent identity leakage and preserve rich motion information, and compels the model to perform spatiotemporal reasoning over the entire motion sequence for more natural and coherent movements. Check our methods, results gallery, as well as comparisons against other baselines at our project page.
<p align="center"> <img src='resources/1.gif' width='66%'> <img src='resources/2.gif' width='66%'> <img src='resources/3.gif' width='66%'> </p>🌱 Community Works
❤️ A heartfelt thanks to friends in the community for their creativity! All results below are shared with their gracious consent. We were surprised to see the emergent abilities our model exhibited — understanding the 3D spatial relationships of 2D characters, driving hand-drawn artwork, and even controlling quadrupeds despite having no animal training data at all.
<table align="center" border="0" cellspacing="0" cellpadding="6"> <!-- 第一行 --> <tr> <td align="center"> <img src="resources/community1.gif" width="220"><br> <em>Chibi Gotham Battle</em> </td> <td align="center"> <img src="resources/community2.gif" width="250"><br> <em>Homer Bullet Time (w/ Uni3c)</em> </td> <td align="center" rowspan="2"> <img src="resources/community4.gif" width="150"><br> <em>Anime Art Animation</em> </td> </tr> <!-- 第二行 --> <tr> <td align="center" colspan="2"> <img src="resources/community3.gif" width="400"><br> <em>Street Fighter 6 Motion Mimic</em> </td> </tr> <!-- 第三行 --> <tr> <td align="center"> <img src="resources/community6.gif" width="150"><br> <em>Doodle Art Animation</em> </td> <td align="center"> <img src="resources/community8.gif" width="150"><br> <em>Dual Dance</em> </td> <td align="center" colspan="2"> <img src="resources/community5.gif" width="150"><br> <em>Group Dance</em> <img src="resources/community7.gif" width="200"><br> <em>Quadrupeds Animation (w/ ViTPose)</em> </td> </tr> </table>🗞️ Updates and Plans
- 2026.3.1: 🔥 SCAIL is now native in ComfyUI.
- 2025.12.19: 📣 We offer the Wan Official Framework of SCAIL instead of SAT for more convenient inference. Check the wan branch of SCAIL. We will update the training code of SCAIL on SAT for reproducibility.
- 2025.12.11: 💥 The preview version of SCAIL is now opensourced on HuggingFace and ModelScope.
- 2025.12.08: 🔥 We release the inference code of SCAIL on SAT.
TODOs
- [x] SCAIL-14B-Preview Model Weights(512p, 5s) and Inference Config
- [x] Prompt Optimization Snippets
- [x] Implementation on Wan Official Framework
- [ ] SCAIL-Official(1.3B/14B) Model Weights(Improved Stability and Clarity, Innate Long Video Generation Capability) and Inference Config
📰 News
- 2026.3.1: Thanks to toyxyz, a Blender 3D rig can be used with scail-pose now, allowing for much more dynamic and diverse shapes and poses, see #30.
- 2025.12.19: ComfyUI-SCAIL-Pose now supports saving NLF mesh as 3D glb animation and 3D previewing of the SCAIL-Pose skeleton.
- 2025.12.19: Thanks to deepbeepmeep for Low VRAM SCAIL Preview Support in WanGP! WanGP version has the following perks: 3D pose Preprocessing fully integrated, speed optimized, and compatible with any pytorch version.
- 2025.12.17: Thanks to VantageWithAI, GGUF version is now available at SCAIL-Preview-GGUF!
- 2025.12.16: ❤️ Huge thanks to KJ for the work done on adaptation — SCAIL is now available in ComfyUI-WanVideoWrapper!!! Meanwhile, the pose extraction & rendering has also been partly adapted to ComfyUI in ComfyUI-SCAIL-Pose, currently without multi-character tracking.
- 2025.12.14: 🥳 Thanks to friends in the community for testing the work! Despite the fact that only 1.5% of SCAIL’s training samples are anime data, and that we did not intentionally collect any multi-character anime data, the model can generalize towards many complex anime characters. The release of SCAIL-Preview is intended to demonstrate the soundness of our proposed pose representation and model architecture, with clear potential for further scaling and enhancement.
🚀 Getting Started
Checkpoints Download
| ckpts | Download Link | Notes | |--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------| | SCAIL-Preview(14B) | 🤗 Hugging Face<br> 🤖 ModelScope | Trained with resolutions under 512p.<br> H and W should be both divisible by 32<br> (e.g. 704*1280) if using other resolutions. |
Use the following commands to download the model weights (We have integrated both Wan VAE and T5 modules into this checkpoint for convenience).
# Download the repository (skip automatic LFS file downloads)
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/zai-org/SCAIL-Preview
The files should be organized like:
SCAIL-Preview/
├── Wan2.1_VAE.pth
├── model
│ ├── 1
│ │ └── mp_rank_00_model_states.pt
│ └── latest
└── umt5-xxl
├── ...
Environment Setup
Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.
pip install -r requirements.txt
🦾 Usage
Input preparation
The input data should be organized as follows, we have provided some example data in examples/:
examples/
├── 001
│ ├── driving.mp4
│ ├── ref.jpg
└── 002
├── driving.mp4
└── ref.jpg
...
Pose Extraction & Rendering
Use git submodule to download the scail_pose module and then follow the POSE_INSTRUCTION.md to extract and render the pose from the driving video.
git submodule update --init --recursive
After that, the project structure should be like this:
SCAIL/
├── examples
├── sat
├── configs
├── ...
├── scail_pose
Change dir into the subdir and follow instructions:
cd scail_pose
# follow instructions in POSE_INSTRUCTION.md
After pose extraction and rendering, the input data should be organized as follows:
examples/
├── 001
│ ├── driving.mp4
│ ├── ref.jpg
│ └── rendered.mp4 (or rendered_aligned.mp4)
└── 002
...
Model Inference
For inference in Wan Official Framework, please refer to the wan branch of SCAIL.
For inference in SAT, run the following command to start the inference with CLI input:
bash scripts/sample_sgl_14Bsc_xc_cli.sh
The CLI will ask you to input in format like <prompt>@@<example_dir>, e.g. the girl is dancing@@examples/001. The example_dir should contain rendered.mp4 or rendered_aligned.mp4 after pose extraction and rendering. Results will be save to samples/.
We support direct txt input too, change input_file in sample_sgl_14Bsc_xc_txt.yaml to path of your input file, and fill in the input file with format like <prompt>@@<example_dir>, then run the following command:
bash scripts/sample_sgl_14Bsc_xc_txt.sh
Note that our model is trained with long detailed prompts, even though a short or even null prompt can be used, the result may not be as good as the long prompt. We will provide our prompt generation snippets, using Google Gemini to read from t
Related Skills
qqbot-channel
343.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
