TAVGBench
Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation
Install / Use
/learn @OpenNLPLab/TAVGBenchREADME
TAVGBench: Benchmarking Text to Audible-Video Generation
Project Overview
We are excited to introduce a pioneering task in the realm of multimodal AI: generating audible-video content from textual descriptions using a latent diffusion model. To facilitate this innovative task, we have developed TAVGBench, a comprehensive benchmark dataset. This large-scale dataset encompasses an impressive 1.7 million entries, each meticulously annotated with corresponding text.
The TAVGBench
Dataset size
Our benchmark dataset is unprecedented in scale, comprising 1.7 million entries, each annotated with rich textual descriptions that align with the corresponding audio and video content. This extensive collection provides a robust foundation for training and evaluating text to audible-video generation models.
Dataset annotation pipeline
The annotation pipeline for TAVGBench is designed to ensure high-quality and consistent data. Each piece of audio and video is paired with detailed textual descriptions, providing a rich dataset for model training and benchmarking. This pipeline involves multiple stages of annotation and validation to guarantee the accuracy and relevance of the annotations.
The video and audio captions within TAVGBench have been open-sourced and are available for download here.
Video demo
To showcase the capabilities of our approach, we have prepared a video demonstration. This demo highlights the impressive results achievable with our text to audible-video generation model, providing a tangible example of the potential applications of this technology.

Related Skills
docs-writer
98.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
329.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
arscontexta
2.8kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
