VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Install / Use
/learn @Vchitect/VBenchREADME

This repository provides unified implementations for the VBench series of works, supporting comprehensive evaluation of video generative models across a wide spectrum of capabilities and settings.
If your questions are not addressed in this README, please contact Ziqi Huang at ZIQI002 [at] e [dot] ntu [dot] edu [dot] sg.
Table of Contents
- Overview - See this section for component locations and the differences between VBench, VBench++, and VBench-2.0.
- Updates
- Evaluation Results
- Video Generation Models Info
- Installation
- Usage
- Prompt Suite
- Sampled Videos
- Evaluation Method Suite
- Citation and Acknowledgement
<a name="overview"></a>
:mega: Overview
This repository provides unified implementations for the VBench series of works, supporting comprehensive evaluation of video generative models across a wide spectrum of capabilities and settings.
(1) VBench
TL;DR: Evaluating Video Generation — Benchmark • Evaluation Dimensions • Evaluation Methods • Human Alignment • Insights
VBench: Comprehensive Benchmark Suite for Video Generative Models <br> Ziqi Huang<sup>∗</sup>, Yinan He<sup>∗</sup>, Jiashuo Yu<sup>∗</sup>, Fan Zhang<sup>∗</sup>, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin<sup>+</sup>, Yu Qiao<sup>+</sup>, Ziwei Liu<sup>+</sup><br> IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

We propose VBench, a comprehensive benchmark suite for video generative models. We design a comprehensive and hierarchical <b>Evaluation Dimension Suite</b> to decompose "video generation quality" into multiple well-defined dimensions to facilitate fine-grained and objective evaluation. For each dimension and each content category, we carefully design a <b>Prompt Suite</b> as test cases, and sample <b>Generated Videos</b> from a set of video generation models. For each evaluation dimension, we specifically design an <b>Evaluation Method Suite</b>, which uses carefully crafted method or designated pipeline for automatic objective evaluation. We also conduct <b>Human Preference Annotation</b> for the generated videos for each dimension, and show that VBench evaluation results are <b>well aligned with human perceptions</b>. VBench can provide valuable insights from multiple perspectives.
Note: The code and README for the VBench components are located here, relative path: ..
@InProceedings{huang2023vbench,
title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},
author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
(2) VBench++
TL;DR: Extends VBench with (1) VBench-I2V for image-to-video, (2) VBench-Long for long videos, and (3) VBench-Trustworthiness covering fairness, bias, and safety.
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models <br> Ziqi Huang<sup>∗</sup>, Fan Zhang<sup>∗</sup>, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin<sup>+</sup>, Yu Qiao<sup>+</sup>, Ziwei Liu<sup>+</sup><br> IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

<b>VBench++</b> supports a wide range of video generation tasks, including text-to-video and image-to-video, with an adaptive Image Suite for fair evaluation across different settings. It evaluates not only technical quality but also the trustworthiness of generative models, offering a comprehensive view of model performance. We continually incorporate more video generative models into VBench to inform the community about the evolving landscape of video generation.
Note: The code and README for the VBench++ components are located at:
- (1) VBench-I2V (image-to-video): link, relative path:
vbench2_beta_i2v - (2) VBench-Long (long video evaluation): link, relative path:
vbench2_beta_long - (3) VBench-Trustworthiness (fairness, bias, and safety): link, relative path:
vbench2_beta_trustworthiness
*These modules belong to VBench++, not VBench, or VBench-2.0. However, to maintain backward compatibility for users who have already installed the repository, we preserve the original relative path names and provide this clarification here. *
@article{huang2025vbench++,
title={{VBench++}: Comprehensive and Versatile Benchmark Suite for Video Generative Models},
author={Huang, Ziqi and Zhang, Fan and Xu, Xiaojie and He, Yinan and Yu, Jiashuo and Dong, Ziyue and Ma, Qianli and Chanpaisit, Nattapol and Si, Chenyang and Jiang, Yuming and Wang, Yaohui and Chen, Xinyuan and Chen, Ying-Cong and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2025},
doi={10.1109/TPAMI.2025.3633890}
}
(3) VBench-2.0
TL;DR: Extends VBench to evaluate intrinsic faithfulness — a key challenge for next-generation video generation models.
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness<br> Dian Zheng<sup>∗</sup>, Ziqi Huang<sup>∗</sup>, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, [Wei-Shi Zheng](https://www.isee-ai.cn/~zhwsh
Related Skills
qqbot-channel
345.9kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.0k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
345.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
