UniTime

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Generate Convert Improve

Install / Use

/learn @Lzq5/UniTime

About this skill

Quality Score

0/100

README

UniTime

This repository provides the official PyTorch implementation of "Universal Video Temporal Grounding with Generative Multi-modal Large Language Models" (NeurIPS 2025).

🌐 Project Page $\cdot$ 📄 Paper $\cdot$ 🤗 Model

🔥 News

[2025.10] Released the code for data construction, training, and evaluation.
[2025.09] UniTime accepted to NeurIPS 2025!
[2025.06] Released the inference code.
[2025.06] Preprint available on arXiv.

⚙️ Installation

conda create -n UniTime python=3.10
conda activate UniTime
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

🚀 Quick Start

Download Model Checkpoints
- Obtain the pretrained checkpoints from Qwen2-VL-7B and UniTime.
- Set model_local_path to your local path for Qwen2-VL-7B, and model_finetune_path to your UniTime checkpoint.
Prepare Input Data
- Create a JSON file for inference as data/test.json, and specify its path via the data_path argument.

Run Inference

Execute the following command to perform inference. The output results will be saved in the results/ directory.

export CUDA_VISIBLE_DEVICES=0
python inference.py --model_local_path path_to_qwen2vl7B \
     --model_finetune_path ckpt/unitime \
     --data_path data/test.json \
     --output_dir ./results/test \
     --nf_short 128

Data Preparation

Download the video and annotation files for each dataset from the corresponding source links.

Create the input file following the format below:

[
    {
        "qid": 0, 
        "id": "3MSZA", 
        "annos": [
            {
                "query": "person turn a light on.",
                "window": [[24.3, 30.4]]
            }
        ],
        "duration": 30.96,
        "video_path": "./videos/3MSZA.mp4",
        "mode": "mr",
    }
]

Example construction code for Ego4D-NLQ can be found in datasets/data_ego4d.py (see load_data_to_dict() function). Modify it as needed for other datasets.

(Optional) You may also download preprocessed annotations for each dataset from UniTime-Data.

Training and Evaluation

Execute the following commands in sequence:

# Feature Extraction
bash scripts/feature.sh

# Training
bash scripts/train.sh

# Evaluation
bash scripts/eval.sh

# Metrics
python eval_metrics.py --res ./results/RUN_NAME/results.json

Note: Modify the arguments marked with ToModify in the code according to the following definitions:

Citation

If you use this code and data for your research or project, please cite:

@inproceedings{unitime2025,
    title={Universal Video Temporal Grounding with Generative Multi-modal Large Language Models},
    author={Li, Zeqian and Di, Shangzhe and Zhai, Zhonghua and Huang, Weilin and Wang, Yanfeng and Xie, Weidi},
    booktitle={NeurIPS},
    year={2025}
}

Acknowledgements

This project builds upon several excellent open-source efforts:

Contact

For questions, please contact: lzq0103@sjtu.edu.cn.

Related Skills

qqbot-channel

345.9k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

345.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

Lzq5

View profile

View on GitHub

GitHub Stars49

CategoryContent

Updated3d ago

Forks2

Lzq5/UniTime

Languages

Python

Security Score

75/100

Audited on Mar 30, 2026

No findings