Repurpose

[AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Generate Convert Improve

Install / Use

/learn @yongliang-wu/Repurpose

About this skill

Quality Score

0/100

README

<h2 align="center">Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark</h2>

News

:fire: [2024.12.10] Our paper is accepted by AAAI-2025 !

Introduction

This repository provides the PyTorch implementation for the paper Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark. The research introduces Repurpose-10K, a large-scale dataset designed to tackle the challenge of long-to-short video repurposing. The dataset contains over 10,000 videos and 120,000+ annotated clips, making it a benchmark for automatic video repurposing.

What is Video Repurposing?

With the rise of short-form video platforms like TikTok, Instagram Reels, and YouTube Shorts, there is a growing need to efficiently extract engaging segments from long-form content such as vlogs, interviews, and live streams. Video repurposing involves:

Identifying highly engaging segments from long videos.
Ensuring narrative coherence in the repurposed clips.
Optimizing for direct publishing on social media.

Short-form videos typically vary in length from 0 seconds to 3 minutes, with most content being around 60 seconds, depending on the platform’s specific guidelines and audience preferences. This time frame ensures that the video remains concise, engaging, and optimized for rapid consumption. <img width="958" alt="image" src="https://github.com/user-attachments/assets/5985db0b-7a3c-4064-8dff-ed1ef019ccac" />

About Repurpose-10K

To address the lack of large-scale benchmarks for this task, Repurpose-10K was created by collecting real-world user interactions on User Generated Content (UGC). The annotation process involves:

Initial segmentation using AI-assisted tools.
User preference voting to mark preferred clips.
Manual refinement of timestamps by content creators.

This ensures high-quality, human-curated ground truth labels for training video repurposing models.

Getting Started

Setting Up Your Environment

To ensure a smooth experience running the scripts, set up a dedicated conda environment by executing the following commands in your terminal:

conda create -n repurpose python=3.9
conda activate repurpose
pip install -r requirements.txt

Preparing Your Data

The train/validation/test splits are provided in the /data directory. Follow these steps for data preparation:

Download the source videos using yt-dlp.
Extract the required features as mentioned in our paper using these repositories:

Training Your Model

To begin training the model, use the command below:

python main.py --config_path configs/Repurpose.yaml

For model evaluation, execute the following command:

python inference.py --config_path configs/Repurpose.yaml --resume your_ckpt_path

Replace your_ckpt_path with the actual path to your checkpoint file.

Citation

@inproceedings{wu2025video,
  title={Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark},
  author={Wu, Yongliang and Zhu, Wenbo and Cao, Jiawang and Lu, Yi and Li, Bozheng and Chi, Weiheng and Qiu, Zihan and Su, Lirian and Zheng, Haolin and Wu, Jay and others},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={8},
  pages={8487--8495},
  year={2025}
}

Acknowledgments

We would like to extend our gratitude to the authors and contributors of the following repositories, which have been instrumental in the development of our implementation:

https://github.com/ttgeng233/UnAV
https://github.com/DocF/Soft-NMS

Related Skills

qqbot-channel

350.8k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.5k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

350.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

yongliang-wu

View profile

View on GitHub

GitHub Stars27

CategoryContent

Updated3d ago

Forks4

yongliang-wu/Repurpose

Languages

Python

Security Score

90/100

Audited on Apr 4, 2026

No findings