LightForcing
Official repository for the paper "Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention"
Install / Use
/learn @chengtao-lv/LightForcingREADME
Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang
NTU, HKUST, Sensetime (LightX2V Group)
<h1></h1>https://github.com/user-attachments/assets/2daa9f17-329e-4019-8f14-68ac2c467592
<em> (Results on Self Forcing 1.3B. Left: Dense Attention. Right: 1.3x acceleration using Light Forcing) </em> </div>💡 Why Light Forcing
- 🥇 Pioneer work: The first to explore sparse attention acceleration for autoregressive video generation.
- 🏆 Superior performance: Achieves a VBench total score of 84.5, delivering high-quality results with strong overall performance.
- 🚀 Significant acceleration: Provides over 3× Attention speedup and 1.2–1.3× end-to-end speedup. up to 2.3× end-to-end acceleration when combined with FP8 and LightVAE (19.7 FPS on a single RTX 5090 GPU).
🧾 Introduction
Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., $1.2{\sim}1.3\times$ end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a $2.3\times$ speedup and 19.7 FPS on an RTX 5090 GPU.
<img src="assets/framework.png" width="90%" ></img>
🧾 Results
<img src="assets/result.png" width="90%" ></img>
🤝 Acknowledgments
We develop our code referring to the following repos:
✏️ Citation
If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work. We are currently organizing the code, and it will be open-sourced upon the paper is accepted.
Related Skills
qqbot-channel
351.4kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
351.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
Security Score
Audited on Mar 11, 2026
