QVGen

[ICLR 2026] This is the official PyTorch implementation of "QVGen: Pushing the Limit of Quantized Video Generative Models".

Generate Convert Improve

Install / Use

/learn @ModelTC/QVGen

About this skill

Quality Score

0/100

README

<div align="center" style="font-family: charter;"> <h1> QVGen:<br> Pushing the Limit of Quantized Video Generative Models</h1>

[ Conference Paper | Models | Dataset ]

Yushi Huang, Ruihao Gong📧, Jing Liu, Yifu Ding, Chengtao Lv, Haotong Qin, Jun Zhang📧

(📧 denotes corresponding author.)

</div>

This is the official implementation of our paper QVGen. It is the first to reach full-precision comparable quality under 4-bit settings and it significantly outperforms existing methods. For instance, our 3-bit CogVideoX-2B improves Dynamic Degree by +25.28 and Scene Consistency by +8.43 on VBench.

:fire: News

Jan 31, 2026: 🔥 We release our Python code and checkpoints for QVGen presented in our paper. Have a try!
Jan 23, 2026: 🌟 Our paper has been accepted by ICLR 2026! 🎉 Cheers!

🎬 Visual Examples

<div align="center"> <table> <tr> <td align="center" width="25%"> <video src="https://github.com/user-attachments/assets/ebc73b77-1c5a-4ef4-a364-5f7286309c1e" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>BF16</strong> </td> <td align="center" width="25%"> <video src="https://github.com/user-attachments/assets/c45af223-609c-4056-8198-ada018f5a43e" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A4 QVGen (Ours)</strong> </td> <td align="center" width="25%"> <video src="https://github.com/user-attachments/assets/5f54383b-2e06-4833-ad25-829bc4445006" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A4 EfficientDM</strong> </td> </tr> <tr> <td align="center" width="25%"> <video src="https://github.com/user-attachments/assets/9f1d6a24-50e1-4f97-ad2d-5ed0a32ad22b" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A4 Q-DM</strong> </td> <td align="center" width="25%"> <video src="https://github.com/user-attachments/assets/5abc840a-fe64-4f90-9be7-37499d1a205e" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A4 LSQ</strong> </td> <td align="center" width="25%"> <video src="https://github.com/user-attachments/assets/2aa824f4-76fb-456e-a056-ec138adeb14c" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A6 SVDQuant</strong> </td> </tr> </table> <br/> <table> <tr> <td align="center" width="40%"> <video src="https://github.com/user-attachments/assets/e8536dfb-ad92-4921-827b-a9b39d7c6985" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>BF16</strong> </td> <td align="center" width="40%"> <video src="https://github.com/user-attachments/assets/0f128045-6cc4-4c1a-b153-f0a48e85f15a" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A4 QVGen (Ours)</strong> </td> </tr> <tr> <td align="center" width="40%"> <video src="https://github.com/user-attachments/assets/03abc96c-35d0-4855-835a-2c266c87ee47" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>BF16</strong> </td> <td align="center" width="40%"> <video src="https://github.com/user-attachments/assets/a352ec84-407c-433e-b41e-cbdc0417ffff" controls loop autoplay muted style="max-width: 100%; height: auto;"> Your browser does not support the video tag. </video> <strong>W4A4 QVGen (Ours)</strong> </td> </tr> </table> <p><small>Comparison of samples from 4-bit per-channel weight and per-token activation quantized CogVideoX-2B (upper) and Wan 14B (lower), across different methods.</small></p> </div>

📖 Overview

<h align="justify"><strong>Overview pipeline of the proposed QVGen.</strong> (a) This framework integrates auxiliary modules $\Phi$ to improve training convergence. (b) To maintain performance while eliminating inference overhead induced by $\Phi$, we design a <i>rank-decay</i> schedule that progressively shrinks the entire $\Phi$ to $\varnothing$ through <i>iteratively applying</i> the following two strategies: (<i>i</i>) SVD to identify the low-impact components in $\Phi$; (<i>ii</i>) A rank-based regularization $\mathbf{\gamma}$ to decay the identified components to $\varnothing$. </h>

</div>

✨ Quick Start

After cloning the repository, follow these steps to train and run inference. We use Wan 1.3B as the example (the same workflow applies to CogVideoX 2B).

Requirements

Install dependencies with pip install -r requirements.txt. For quantization, we use 8x H100/H800/A100/A800 GPUs; otherwise, adjust the scripts accordingly.

Prepare data and models

Download the pretrained Wan2.1 1.3B to models/Wan2.1-T2V-1.3B-Diffusers, then prepare the dataset:

# download and preprocess data
python prepare_dataset/download.py --output_directory dataset
sh script/data/prepare_dataset_wan_1-3b.sh

Note: please replace any path/to/your/... placeholders in the scripts with your local paths.

We also provide quantized checkpoints on Hugging Face:

| Model | #Bit | | --- | --- | | Wan 1.3B | W4A4 | | CogVideoX-2B | W4A4 |

Training

Below is an example training command. For more details, please refer to the paper.

sh script/train/w4a4_wan_1-3b.sh

Inference

Here is the corresponding inference command.

# You can also set the path to a pre-downloaded quantized checkpoint in the script.
sh script/inference/w4a4_wan_1-3b.sh

Evaluation

We recommend using our inference code and following the steps in VBench. For evaluation, you can also refer to our distributed inference scripts for video generation: inference/ddp_cogvideox_t2v.py and inference/ddp_wan_t2v.py.

💪 TODO

The codebase has not been fully cleaned for public release yet. We will continue to clean and refine it in subsequent updates.

[ ] Training and inference code for large models (e.g., Wan 14B, CogVideoX 5B).
[ ] More quantized checkpoints.
[ ] More efficient training code.

🤝 Acknowledgments

Our code is built on the open-source finetrainers.

✏️ Citation

If you find QVGen useful, please cite our paper:

@inproceedings{huang2026qvgenpushinglimitquantized,
  title={QVGen: Pushing the Limit of Quantized Video Generative Models}, 
  author={Yushi Huang and Ruihao Gong and Jing Liu and Yifu Ding and Chengtao Lv and Haotong Qin and Jun Zhang},
  booktitle={International Conference on Learning Representations},
  year={2026},
  url={https://arxiv.org/abs/2505.11497}, 
}

Related Skills

docs-writer

99.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

342.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

cursor-agent-tracking

134

A repository that provides a structured system for maintaining context and tracking changes in Cursor's AGENT mode conversations through template files, enabling better continuity and organization of AI interactions.

ModelTC

View profile

View on GitHub

GitHub Stars28

CategoryContent

Updated3d ago

Forks0

ModelTC/QVGen

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings