LORIS
[ICML2023] Long-Term Rhythmic Video Soundtracker
Install / Use
/learn @OpenGVLab/LORISREADME
LORIS
This is the official implementation of "Long-Term Rhythmic Video Soundtracker", ICML2023.
Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, and Yu Qiao.
OpenGVLab, Shanghai Artificial Intelligence Laboratory
Introduction
We present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms in sync with visual cues. Our framework consists of a latent conditional diffusion probabilistic model to perform waveform synthesis. Furthermore, a series of context-aware conditioning encoders are proposed to take temporal information into consideration for a long-term generation. We also extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating. To perform comprehensive evaluations, we establish a benchmark for rhythmic video soundtracks including the pre-processed dataset, improved evaluation metrics, and robust generative baselines.

How to Start
pip install -r requirements.txt
Training
bash scripts/loris_{subset}_s{length}.sh
Inference
bash scripts/infer_{subset}_s{length}.sh
Dataset
Dataset is available in huggingface.
from datasets import load_dataset
dataset = load_dataset("OpenGVLab/LORIS")
Model Zoo
We provide the pre-trained checkpoints and backbone audio diffusion model in huggingface.
It should be noted that these checkpoints must only be used for research purposes.
Citation
@inproceedings{Yu2023Long,
title={Long-Term Rhythmic Video Soundtracker},
author={Yu, Jiashuo and Wang, Yaohui and Chen, Xinyuan and Sun, Xiao and Qiao, Yu },
booktitle={International Conference on Machine Learning (ICML)},
year={2023}
}
Acknowledgement
We would like to thank the authors of previous related projects for generously sharing their code and insights: audio-diffusion-pytorch, CDCD, D2M-GAN, VQ-Diffusion, and JukeBox.
Related Skills
qqbot-channel
349.9kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.4k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
