<h1 align="center">TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs 🏆 CVPR 2026</h1> <a href="https://home.j-zh.top/">Jun Zhang</a>, <a href="http://ttengwang.com/">Teng Wang</a>, <a href="https://geyuying.github.io/">Yuying Ge</a>, <a href="https://geyixiao.com/">Yixiao Ge</a>, <a href="https://scholar.google.com/citations?user=evR3uR0AAAAJ">Xinhao Li</a>, <a href="https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en">Ying Shan</a>, <a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ&hl=en">Limin Wang</a> &nbsp&nbsp📑 <a href="https://arxiv.org/abs/2512.14698">Paper</a>&nbsp&nbsp | &nbsp&nbsp🏠 <a href="https://timelens-arc-lab.github.io/">Project Page</a>&nbsp&nbsp | 🤗 <a href="https://huggingface.co/collections/TencentARC/timelens">Model & Data</a>&nbsp&nbsp | 🏆 <a href="https://timelens-arc-lab.github.io/#leaderboard">TimeLens-Bench Leaderboard</a>&nbsp&nbsp

📰 News

2026.02.26: 🚀 Training for TimeLens-7B based on Qwen2.5-VL-7B is available on the train branch.
2026.02.26: 🚀 We now support training TimeLens-8B based on Qwen3-8B-VL.
2026.02.22: 🎉 TimeLens has been accepted to CVPR 2026.

🔎 Overview

TimeLens rethinks video temporal grounding (VTG) with MLLMs along two axes:

Data Quality. We expose critical quality issues in existing VTG benchmarks and propose quality-assured datasets for both training and evaluation.
Algorithmic Design. Building upon reliable data, we explore effective timestamp encoding strategies and training recipes, achieving state-of-the-art performance among open-source models.

📚 Quick Navigation

In this repository, we release:

🤖 TimeLens Models: State-of-the-art open-source models for video temporal grounding.
- Model Usage
📊 TimeLens-Bench: a comprehensive, high-quality evaluation benchmark for video temporal grounding.
- 🏆 Leaderboard
- Evaluation Guide
🏋️ TimeLens-100K: a large-scale, diverse, high-quality training dataset for video temporal grounding, annotated with Gemini-2.5-Pro.
- Training Guide

📦 Installation

Clone this repository and navigate to the folder

git clone https://github.com/TencentARC/TimeLens.git
cd TimeLens

Create a Conda environment and install the required packages

conda create -n timelens python=3.11 -y
conda activate timelens

# install dependencies for inference
pip install -r requirements.txt -f https://download.pytorch.org/whl/cu124 # We use CUDA Version 12.4

# Optional: install extra dependencies for training
pip install -r requirements_train.txt

# Install flash-attn (required for BOTH training and inference!)
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir

🤖 Using TimeLens Models

TimeLens models are a family of MLLMs with SotA video temporal grounding performance. They are built upon the Qwen2.5-VL and Qwen3-VL baselines through training on our high-quality TimeLens-100K dataset, leveraging our carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy.

🚀 Quick Start

All models are available on Hugging Face and support out-of-the-box inference using the 🤗Transformers library. For detailed usage instructions and code examples, please refer to the specific model's Hugging Face page linked below.

🏆 Model Zoo & Performance

The following table lists our models with their Hugging Face links and grounding performance:

<table> <thead> <tr> <th rowspan="2" align="center">Model (with 🤗HuggingFace Link)</th> <th colspan="4" align="center">Charades-TimeLens</th> <th colspan="4" align="center">ActivityNet-TimeLens</th> <th colspan="4" align="center">QVHighlights-TimeLens</th> </tr> <tr> <th align="center">R1 @0.3</th> <th align="center">R1 @0.5</th> <th align="center">R1 @0.7</th> <th align="center">mIoU</th> <th align="center">R1 @0.3</th> <th align="center">R1 @0.5</th> <th align="center">R1 @0.7</th> <th align="center">mIoU</th> <th align="center">R1 @0.3</th> <th align="center">R1 @0.5</th> <th align="center">R1 @0.7</th> <th align="center">mIoU</th> </tr> </thead> <tbody> <tr> <td><a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">Qwen2.5-VL-7B-Instruct</a></td> <td align="center">59.7</td> <td align="center">37.8</td> <td align="center">16.6</td> <td align="center">39.3</td> <td align="center">44.1</td> <td align="center">31.0</td> <td align="center">16.1</td> <td align="center">31.4</td> <td align="center">41.5</td> <td align="center">27.8</td> <td align="center">15.2</td> <td align="center">31.6</td> </tr> <tr> <td><a href="https://huggingface.co/TencentARC/TimeLens-7B">TimeLens-7B🚀</a></td> <td align="center">70.5</td> <td align="center">55.6</td> <td align="center">28.4</td> <td align="center">48.8</td> <td align="center">62.8</td> <td align="center">51.0</td> <td align="center">32.6</td> <td align="center">46.2</td> <td align="center">74.1</td> <td align="center">62.7</td> <td align="center">43.1</td> <td align="center">56.0</td> </tr> <tr> <td><a href="https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct">Qwen3-VL-8B-Instruct</a></td> <td align="center">69.2</td> <td align="center">53.4</td> <td align="center">27.5</td> <td align="center">48.3</td> <td align="center">62.1</td> <td align="center">51.2</td> <td align="center">34.4</td> <td align="center">46.8</td> <td align="center">74.2</td> <td align="center">64.6</td> <td align="center">49.3</td> <td align="center">59.4</td> </tr> <tr> <td><a href="https://huggingface.co/TencentARC/TimeLens-8B">TimeLens-8B🚀</a></td> <td align="center">76.6</td> <td align="center">63.0</td> <td align="center">35.2</td> <td align="center">55.2</td> <td align="center">68.9</td> <td align="center">58.4</td> <td align="center">40.6</td> <td align="center">53.2</td> <td align="center">80.2</td> <td align="center">71.6</td> <td align="center">55.5</td> <td align="center">65.5</td> </tr> </tbody> </table>

TimeLens-7B is fine-tuned from Qwen2.5-VL-7B-Instruct, and TimeLens-8B is fine-tuned from Qwen3-VL-8B-Instruct.

[!NOTE] For detailed comparison with other models, please refer to the 🏆 Leaderboard.

📊 Evaluation on TimeLens-Bench

Download TimeLens-Bench

Download the TimeLens-Bench dataset from Hugging Face and place it in the data/TimeLens-Bench directory:

hf download TencentARC/TimeLens-Bench \
  --repo-type=dataset \
  --local-dir data/TimeLens-Bench

Extract the compressed videos:

mkdir -p data/TimeLens-Bench/videos
find data/TimeLens-Bench/video_shards -name "*.tar.gz" | \
  xargs -P 4 -I {} tar -xzf {} -C data/TimeLens-Bench/videos # Parallel extraction with 4 processes

The folder structure should look like this:

TimeLens/
└── data/
    └── TimeLens-Bench/
        ├── activitynet-timelens.json
        ├── charades-timelens.json
        ├── qvhighlights-timelens.json
        ├── videos/              # extracted videos
        │   ├── activitynet/
        │   ├── charades/
        │   └── qvhighlights/
        └── video_shards/        # compressed videos (can be deleted after extraction)

Evaluate with Our Codebase (TimeLens / Qwen-VL Models)

Our codebase supports evaluation of the following models:

| Model | Supported | |:----------:|:---------:| | TimeLens-7B | ✅ | | TimeLens-8B | ✅ | | Qwen2.5-VL | ✅ | | Qwen3-VL | ✅ |

The evaluation script is scripts/eval_timelens_bench.sh. You can set the following environment variables:

model_path: Path or HuggingFace ID of the model to evaluate. Default: TencentARC/TimeLens-8B
datasets: Comma-separated list of datasets to evaluate. Default: charades-timelens,activitynet-timelens,qvhighlights-timelens
CUDA_VISIBLE_DEVICES: GPU indices to use (e.g., 0,1,2,3). Default: Auto-detect all available GPUs
pred_path: Directory to save results. Default: ./logs
min_tokens: Minimum tokens for video encoding. Default: 64
total_tokens: Total tokens for video encoding. Default: 14336
FPS: Frames per second for video sampling. Default: 2

Example 1: Evaluate TimeLens-8B (default settings)

model_path="TencentARC/TimeLens-8B" bash scripts/eval_timelens_bench.sh

Example 2: Evaluate TimeLens-7B on specific datasets with specific GPUs

CUDA_VISIBLE_DEVICES=0,1 \
datasets="activitynet-timelens,qvhighlights-timelens" \
model_path="TencentARC/TimeLens-7B" \
bash scripts/eval_timelens_bench.sh

Example 3: Evaluate Qwen3-VL with a local model path and a custom path to save results:

pred_path="./path/to/results"

TimeLens

Install / Use

README