VideoAlign

[NeurIPS 2025] Improving Video Generation with Human Feedback

Generate Convert Improve

Install / Use

/learn @KlingAIResearch/VideoAlign

About this skill

Quality Score

0/100

README

<h1 align="center"> Improving Video Generation with Human Feedback </h1> <div align="center">  <a href='https://arxiv.org/abs/2501.13918'><img src='https://img.shields.io/badge/arXiv-VideoAlign-red'></a>   <a href='https://gongyeliu.github.io/videoalign/'><img src='https://img.shields.io/badge/Project-VideoAlign-green'></a>   <a href="https://github.com/KwaiVGI/VideoAlign"><img src="https://img.shields.io/badge/GitHub-VideoAlign-9E95B7?logo=github"></a>   <a href='https://huggingface.co/KwaiVGI/VideoReward'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Model-VideoReward-blue'></a>   <br> <a href='https://huggingface.co/datasets/KwaiVGI/VideoGen-RewardBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Eval%20Dataset-VideoGen--RewardBench-blue'></a>   <a href='https://huggingface.co/spaces/KwaiVGI/VideoGen-RewardBench'><img src='https://img.shields.io/badge/Space-VideoGen--RewardBench-orange.svg?logo=data:image/svg+xml;charset=utf-8;base64,PHN2ZyB0PSIxNzM5MjA0MzY2MDEwIiBjbGFzcz0iaWNvbiIgdmlld0JveD0iMCAwIDEwMjQgMTAyNCIgdmVyc2lvbj0iMS4xIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHAtaWQ9IjQzNDYiIHdpZHRoPSIyMDAiIGhlaWdodD0iMjAwIj48cGF0aCBkPSJNNjgyLjY2NjY2NyA0NjkuMzMzMzMzVjEyOEgzNDEuMzMzMzMzdjI1Nkg4NS4zMzMzMzN2NTEyaDg1My4zMzMzMzRWNDY5LjMzMzMzM2gtMjU2eiBtLTI1Ni0yNTZoMTcwLjY2NjY2NnY1OTcuMzMzMzM0aC0xNzAuNjY2NjY2VjIxMy4zMzMzMzN6IG0tMjU2IDI1NmgxNzAuNjY2NjY2djM0MS4zMzMzMzRIMTcwLjY2NjY2N3YtMzQxLjMzMzMzNHogbTY4Mi42NjY2NjYgMzQxLjMzMzMzNGgtMTcwLjY2NjY2NnYtMjU2aDE3MC42NjY2NjZ2MjU2eiIgcC1pZD0iNDM0NyIgZmlsbD0iIzhhOGE4YSI+PC9wYXRoPjwvc3ZnPg=='></a>   <br> </div>

📖 Introduction

This repository open-sources the VideoReward component -- our VLM-based reward model introduced in the paper Improving Video Generation with Human Feedback. For Flow-DPO, we provide an implementation for text-to-image tasks here.

VideoReward evaluates generated videos across three critical dimensions:

Visual Quality (VQ): The clarity, aesthetics, and single-frame reasonableness.
Motion Quality (MQ): The dynamic stability, dynamic reasonableness, naturalness, and dynamic degress.
Text Alignment (TA): The relevance between the generated video and the text prompt.

This versatile reward model can be used for data filtering, guidance, reject sampling, DPO, and other RL methods. <br>

📝 Updates

[2025.08.14]: 🔥 We provide the prompt sets used to evaluate video generation performance in this paper, including VBench, VideoGen-Eval, and TA-Hard. See ./datasets/video_eval_prompts for details.
[2025.07.17]: 🔥 Release the Flow-DPO.
[2025.02.08]: 🔥 Release the VideoGen-RewardBench and Leaderboard.
[2025.02.08]: 🔥 Release the Code and Checkpoints of VideoReward.
[2025.01.23]: Release the Paper and Project Page.

🚀 Quick Started

1. Environment Set Up

Clone this repository and install packages.

git clone https://github.com/KwaiVGI/VideoAlign
cd VideoAlign
conda env create -f environment.yaml
conda activate VideoReward
pip install flash-attn==2.5.8 --no-build-isolation

2. Download Pretrained Weights

Please download our checkpoints from Huggingface and put it in ./checkpoints/.

cd checkpoints
git lfs install
git clone https://huggingface.co/KwaiVGI/VideoReward
cd ..

3. Scoring for a single prompt-video item.

python inference.py

✨ Eval the Performance on VideoGen-RewardBench

1. Download the VideoGen-RewardBench and put it in `./datasets/`.

cd dataset
git lfs install
git clone https://huggingface.co/datasets/KwaiVGI/VideoGen-RewardBench
cd ..

2. Start inference

python eval_videogen_rewardbench.py

🏁 Train RM on Your Own Data

1. Prepare your own data as the instruction stated.

2. Start training!

sh train.sh

🤗 Acknowledgments

Our reward model is based on QWen2-VL-2B-Instruct, and our code is build upon TRL and Qwen2-VL-Finetune, thanks to all the contributors!

⭐ Citation

Please leave us a star ⭐ if you find our work helpful.

@article{liu2025improving,
  title={Improving video generation with human feedback},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Yuan, Ziyang and Liu, Xiaokun and Zheng, Mingwu and Wu, Xiele and Wang, Qiulin and Qin, Wenyu and Xia, Menghan and others},
  journal={arXiv preprint arXiv:2501.13918},
  year={2025}
}

@article{liu2025flow,
  title={Flow-grpo: Training flow matching models via online rl},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
  journal={arXiv preprint arXiv:2505.05470},
  year={2025}
}

Related Skills

docs-writer

99.5k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

341.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

project-overview

FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

KlingAIResearch

View profile

View on GitHub

GitHub Stars440

CategoryContent

Updated9d ago

Forks12

KlingAIResearch/VideoAlign

Languages

Python

Security Score

95/100

Audited on Mar 20, 2026

No findings