SkillAgentSearch skills...

TimeLens

[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Install / Use

/learn @TencentARC/TimeLens
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <h1 align="center">TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs <br/>🏆 CVPR 2026</h1> </p> <p align="center"> <a href="https://home.j-zh.top/">Jun Zhang</a>, <a href="http://ttengwang.com/">Teng Wang</a>, <a href="https://geyuying.github.io/">Yuying Ge</a>, <a href="https://geyixiao.com/">Yixiao Ge</a>, <a href="https://scholar.google.com/citations?user=evR3uR0AAAAJ">Xinhao Li</a>, <a href="https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en">Ying Shan</a>, <a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ&hl=en">Limin Wang</a> </p> <p align="center"> &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2512.14698"><b>Paper</b></a>&nbsp&nbsp | &nbsp&nbsp🏠 <a href="https://timelens-arc-lab.github.io/"><b>Project Page</b></a>&nbsp&nbsp | 🤗 <a href="https://huggingface.co/collections/TencentARC/timelens"><b>Model & Data</b></a>&nbsp&nbsp | 🏆 <a href="https://timelens-arc-lab.github.io/#leaderboard"><b>TimeLens-Bench Leaderboard</b></a>&nbsp&nbsp </p>

📰 News

  • 2026.02.26: 🚀 Training for TimeLens-7B based on Qwen2.5-VL-7B is available on the train branch.
  • 2026.02.26: 🚀 We now support training TimeLens-8B based on Qwen3-8B-VL.
  • 2026.02.22: 🎉 TimeLens has been accepted to CVPR 2026.

🔎 Overview

TimeLens rethinks video temporal grounding (VTG) with MLLMs along two axes:

  • Data Quality. We expose critical quality issues in existing VTG benchmarks and propose quality-assured datasets for both training and evaluation.
  • Algorithmic Design. Building upon reliable data, we explore effective timestamp encoding strategies and training recipes, achieving state-of-the-art performance among open-source models.

📚 Quick Navigation

In this repository, we release:

  • 🤖 TimeLens Models: State-of-the-art open-source models for video temporal grounding.
  • 📊 TimeLens-Bench: a comprehensive, high-quality evaluation benchmark for video temporal grounding.
  • 🏋️ TimeLens-100K: a large-scale, diverse, high-quality training dataset for video temporal grounding, annotated with Gemini-2.5-Pro.

📦 Installation

Clone this repository and navigate to the folder

git clone https://github.com/TencentARC/TimeLens.git
cd TimeLens

Create a Conda environment and install the required packages

conda create -n timelens python=3.11 -y
conda activate timelens

# install dependencies for inference
pip install -r requirements.txt -f https://download.pytorch.org/whl/cu124 # We use CUDA Version 12.4

# Optional: install extra dependencies for training
pip install -r requirements_train.txt

# Install flash-attn (required for BOTH training and inference!)
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir

🤖 Using TimeLens Models

TimeLens models are a family of MLLMs with SotA video temporal grounding performance. They are built upon the Qwen2.5-VL and Qwen3-VL baselines through training on our high-quality TimeLens-100K dataset, leveraging our carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy.

🚀 Quick Start

All models are available on Hugging Face and support out-of-the-box inference using the 🤗Transformers library. For detailed usage instructions and code examples, please refer to the specific model's Hugging Face page linked below.

🏆 Model Zoo & Performance

The following table lists our models with their Hugging Face links and grounding performance:

<table> <thead> <tr> <th rowspan="2" align="center">Model <br>(with 🤗HuggingFace Link)</th> <th colspan="4" align="center">Charades-TimeLens</th> <th colspan="4" align="center">ActivityNet-TimeLens</th> <th colspan="4" align="center">QVHighlights-TimeLens</th> </tr> <tr> <th align="center">R1<br>@0.3</th> <th align="center">R1<br>@0.5</th> <th align="center">R1<br>@0.7</th> <th align="center">mIoU</th> <th align="center">R1<br>@0.3</th> <th align="center">R1<br>@0.5</th> <th align="center">R1<br>@0.7</th> <th align="center">mIoU</th> <th align="center">R1<br>@0.3</th> <th align="center">R1<br>@0.5</th> <th align="center">R1<br>@0.7</th> <th align="center">mIoU</th> </tr> </thead> <tbody> <tr> <td><a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">Qwen2.5-VL-7B-Instruct</a></td> <td align="center">59.7</td> <td align="center">37.8</td> <td align="center">16.6</td> <td align="center">39.3</td> <td align="center">44.1</td> <td align="center">31.0</td> <td align="center">16.1</td> <td align="center">31.4</td> <td align="center">41.5</td> <td align="center">27.8</td> <td align="center">15.2</td> <td align="center">31.6</td> </tr> <tr> <td><a href="https://huggingface.co/TencentARC/TimeLens-7B"><b>TimeLens-7B</b>🚀</a></td> <td align="center"><b>70.5</b></td> <td align="center"><b>55.6</b></td> <td align="center"><b>28.4</b></td> <td align="center"><b>48.8</b></td> <td align="center"><b>62.8</b></td> <td align="center"><b>51.0</b></td> <td align="center"><b>32.6</b></td> <td align="center"><b>46.2</b></td> <td align="center"><b>74.1</b></td> <td align="center"><b>62.7</b></td> <td align="center"><b>43.1</b></td> <td align="center"><b>56.0</b></td> </tr> <tr> <td><a href="https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct">Qwen3-VL-8B-Instruct</a></td> <td align="center">69.2</td> <td align="center">53.4</td> <td align="center">27.5</td> <td align="center">48.3</td> <td align="center">62.1</td> <td align="center">51.2</td> <td align="center">34.4</td> <td align="center">46.8</td> <td align="center">74.2</td> <td align="center">64.6</td> <td align="center">49.3</td> <td align="center">59.4</td> </tr> <tr> <td><a href="https://huggingface.co/TencentARC/TimeLens-8B"><b>TimeLens-8B</b>🚀</a></td> <td align="center"><b>76.6</b></td> <td align="center"><b>63.0</b></td> <td align="center"><b>35.2</b></td> <td align="center"><b>55.2</b></td> <td align="center"><b>68.9</b></td> <td align="center"><b>58.4</b></td> <td align="center"><b>40.6</b></td> <td align="center"><b>53.2</b></td> <td align="center"><b>80.2</b></td> <td align="center"><b>71.6</b></td> <td align="center"><b>55.5</b></td> <td align="center"><b>65.5</b></td> </tr> </tbody> </table>

TimeLens-7B is fine-tuned from Qwen2.5-VL-7B-Instruct, and TimeLens-8B is fine-tuned from Qwen3-VL-8B-Instruct.

[!NOTE] For detailed comparison with other models, please refer to the 🏆 Leaderboard.

📊 Evaluation on TimeLens-Bench

Download TimeLens-Bench

Download the TimeLens-Bench dataset from Hugging Face and place it in the data/TimeLens-Bench directory:

hf download TencentARC/TimeLens-Bench \
  --repo-type=dataset \
  --local-dir data/TimeLens-Bench

Extract the compressed videos:

mkdir -p data/TimeLens-Bench/videos
find data/TimeLens-Bench/video_shards -name "*.tar.gz" | \
  xargs -P 4 -I {} tar -xzf {} -C data/TimeLens-Bench/videos # Parallel extraction with 4 processes

The folder structure should look like this:

TimeLens/
└── data/
    └── TimeLens-Bench/
        ├── activitynet-timelens.json
        ├── charades-timelens.json
        ├── qvhighlights-timelens.json
        ├── videos/              # extracted videos
        │   ├── activitynet/
        │   ├── charades/
        │   └── qvhighlights/
        └── video_shards/        # compressed videos (can be deleted after extraction)

Evaluate with Our Codebase (TimeLens / Qwen-VL Models)

Our codebase supports evaluation of the following models:

| Model | Supported | |:----------:|:---------:| | TimeLens-7B | ✅ | | TimeLens-8B | ✅ | | Qwen2.5-VL | ✅ | | Qwen3-VL | ✅ |

The evaluation script is scripts/eval_timelens_bench.sh. You can set the following environment variables:

  • model_path: Path or HuggingFace ID of the model to evaluate. Default: TencentARC/TimeLens-8B
  • datasets: Comma-separated list of datasets to evaluate. Default: charades-timelens,activitynet-timelens,qvhighlights-timelens
  • CUDA_VISIBLE_DEVICES: GPU indices to use (e.g., 0,1,2,3). Default: Auto-detect all available GPUs
  • pred_path: Directory to save results. Default: ./logs
  • min_tokens: Minimum tokens for video encoding. Default: 64
  • total_tokens: Total tokens for video encoding. Default: 14336
  • FPS: Frames per second for video sampling. Default: 2

Example 1: Evaluate TimeLens-8B (default settings)

model_path="TencentARC/TimeLens-8B" bash scripts/eval_timelens_bench.sh

Example 2: Evaluate TimeLens-7B on specific datasets with specific GPUs

CUDA_VISIBLE_DEVICES=0,1 \
datasets="activitynet-timelens,qvhighlights-timelens" \
model_path="TencentARC/TimeLens-7B" \
bash scripts/eval_timelens_bench.sh

Example 3: Evaluate Qwen3-VL with a local model path and a custom path to save results:

pred_path="./path/to/results"
View on GitHub
GitHub Stars123
CategoryContent
Updated9h ago
Forks9

Languages

Python

Security Score

80/100

Audited on Apr 9, 2026

No findings