TAVGBench

Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation

Generate Convert Improve

Install / Use

/learn @OpenNLPLab/TAVGBench

About this skill

Quality Score

0/100

README

TAVGBench: Benchmarking Text to Audible-Video Generation

Project Overview

We are excited to introduce a pioneering task in the realm of multimodal AI: generating audible-video content from textual descriptions using a latent diffusion model. To facilitate this innovative task, we have developed TAVGBench, a comprehensive benchmark dataset. This large-scale dataset encompasses an impressive 1.7 million entries, each meticulously annotated with corresponding text.

The TAVGBench

Dataset size

Our benchmark dataset is unprecedented in scale, comprising 1.7 million entries, each annotated with rich textual descriptions that align with the corresponding audio and video content. This extensive collection provides a robust foundation for training and evaluating text to audible-video generation models.

Dataset annotation pipeline

The annotation pipeline for TAVGBench is designed to ensure high-quality and consistent data. Each piece of audio and video is paired with detailed textual descriptions, providing a rich dataset for model training and benchmarking. This pipeline involves multiple stages of annotation and validation to guarantee the accuracy and relevance of the annotations.

The video and audio captions within TAVGBench have been open-sourced and are available for download here.

Video demo

To showcase the capabilities of our approach, we have prepared a video demonstration. This demo highlights the impressive results achievable with our text to audible-video generation model, providing a tangible example of the potential applications of this technology.

Related Skills

docs-writer

98.7k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

329.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

OpenNLPLab

View profile

View on GitHub

GitHub Stars14

CategoryContent

Updated6mo ago

Forks0

OpenNLPLab/TAVGBench

Languages

Python

Security Score

67/100

Audited on Sep 7, 2025

No findings