LLMEPET
[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Install / Use
/learn @fletcherjiang/LLMEPETREADME
Prior Knowledge Integration via LLM Encoding and Pseudo-Event Regulation for Video Moment Retrieval
Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiao-Yong Wei, Chang Wen Chen, and Qing Li.
</div>Official Pytorch Implementation of 'Prior Knowledge Integration via LLM Encoding and Pseudo-Event Regulation for Video Moment Retrieval'
<p align="center"><img width="850" src="images/model.png"></p>Installation | Dataset | Training | Evaluation | Model Zoo
📢 News
[2024.7.21] Our paper has been accepted by ACM Multimedia 2024 (Oral).
[2024.7.10] The code and dataset of related tasks has been released.
[2024.5.10] The repository is public.
[2024.4.10] The repository is created.
<a name="installation"></a>
⚙️ Installation
- Clone the repository from GitHub.
git clone https://github.com/fletcherjiang/LLMEPET.git
cd LLMEPET
- Create conda environment.
conda create -n LLMEPET python=3.8
conda activate LLMEPET
- Download the packages
pip install -r requirements.txt
<a name="dataset"></a>
🗂️ Dataset
For all datasets, we provide extracted features, download them and place them into features/
The prepared dataset should be in the following structure.
.
├── LLMEPET
│ ├── llm_epet
│ └── data
│ └── results
│ └── run_on_video
│ └── standalone_eval
│ └── utils
├── data
├── features
│ └── qvhighlight
│ └── charades
│ └── tacos
│ └── tvsum
│ └── youtube_uni
├── llama
│ └── consolidated.00.pth
│ └── tokenizer.model
│ └── params.json
├──README.md
└── ···
🪐 LLaMA Checkpoint
- You can download from huggingface OR
- LLaMA
If you want to try LLaMA-2 or LLaMA-3, you could download the checkpoints from LLaMA-2 or LLaMA-3. You should edit the (llm_epet/llama.py) by yourself.
<a name="training"></a>
🚀 Training
QVHighlights Training
bash llm_epet/scripts/train.sh
Charades-STA
bash llm_epet/scripts/charades_sta/train.sh
TACoS
bash llm_epet/scripts/tacos/train.sh
TVSum
bash llm_epet/scripts/tvsum/train_tvsum.sh
Youtube-hl
bash llm_epet/scripts/youtube_uni/train.sh
<a name="evaluation"></a>
⭐ QVHighlights Evaluation and Submission
bash llm_epet/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash llm_epet/scripts/inference.sh results/{direc}/model_best.ckpt 'test'
Pack the hl_{val,test}_submission.jsonl files and submit them to CodaLab.
<a name="model"></a>
📦 Model Zoo
Dataset | Model file -- | -- QVHighlights (Slowfast + CLIP) | checkpoints Charades (Slowfast + CLIP) | checkpoints TACoS | checkpoints TVSum | checkpoints Youtube-HL | checkpoints
📖 Citation
If you find the repository or the paper useful, please use the following entry for citation.
@inproceedings{
jiang2024prior,
title={Prior Knowledge Integration via {LLM} Encoding and Pseudo Event Regulation for Video Moment Retrieval},
author={Yiyang Jiang and Wengyu Zhang and Xulu Zhang and Xiaoyong Wei and Chang Wen Chen and Qing Li},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://arxiv.org/abs/2407.15051}
}
Related Skills
qqbot-channel
353.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
353.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
