SkillAgentSearch skills...

TAVGBench

Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation

Install / Use

/learn @OpenNLPLab/TAVGBench
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

TAVGBench: Benchmarking Text to Audible-Video Generation

Project Overview

We are excited to introduce a pioneering task in the realm of multimodal AI: generating audible-video content from textual descriptions using a latent diffusion model. To facilitate this innovative task, we have developed TAVGBench, a comprehensive benchmark dataset. This large-scale dataset encompasses an impressive 1.7 million entries, each meticulously annotated with corresponding text. image

The TAVGBench

Dataset size

Our benchmark dataset is unprecedented in scale, comprising 1.7 million entries, each annotated with rich textual descriptions that align with the corresponding audio and video content. This extensive collection provides a robust foundation for training and evaluating text to audible-video generation models. image

Dataset annotation pipeline

The annotation pipeline for TAVGBench is designed to ensure high-quality and consistent data. Each piece of audio and video is paired with detailed textual descriptions, providing a rich dataset for model training and benchmarking. This pipeline involves multiple stages of annotation and validation to guarantee the accuracy and relevance of the annotations. image

The video and audio captions within TAVGBench have been open-sourced and are available for download here.

Video demo

To showcase the capabilities of our approach, we have prepared a video demonstration. This demo highlights the impressive results achievable with our text to audible-video generation model, providing a tangible example of the potential applications of this technology. demo video

Related Skills

View on GitHub
GitHub Stars14
CategoryContent
Updated6mo ago
Forks0

Languages

Python

Security Score

67/100

Audited on Sep 7, 2025

No findings