TimeLens
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
Install / Use
/learn @TencentARC/TimeLensREADME
📰 News
- 2026.02.26: 🚀 Training for TimeLens-7B based on Qwen2.5-VL-7B is available on the train branch.
- 2026.02.26: 🚀 We now support training TimeLens-8B based on Qwen3-8B-VL.
- 2026.02.22: 🎉 TimeLens has been accepted to CVPR 2026.
🔎 Overview
TimeLens rethinks video temporal grounding (VTG) with MLLMs along two axes:
- Data Quality. We expose critical quality issues in existing VTG benchmarks and propose quality-assured datasets for both training and evaluation.
- Algorithmic Design. Building upon reliable data, we explore effective timestamp encoding strategies and training recipes, achieving state-of-the-art performance among open-source models.
📚 Quick Navigation
In this repository, we release:
- 🤖 TimeLens Models: State-of-the-art open-source models for video temporal grounding.
- 📊 TimeLens-Bench: a comprehensive, high-quality evaluation benchmark for video temporal grounding.
- 🏋️ TimeLens-100K: a large-scale, diverse, high-quality training dataset for video temporal grounding, annotated with Gemini-2.5-Pro.
📦 Installation
Clone this repository and navigate to the folder
git clone https://github.com/TencentARC/TimeLens.git
cd TimeLens
Create a Conda environment and install the required packages
conda create -n timelens python=3.11 -y
conda activate timelens
# install dependencies for inference
pip install -r requirements.txt -f https://download.pytorch.org/whl/cu124 # We use CUDA Version 12.4
# Optional: install extra dependencies for training
pip install -r requirements_train.txt
# Install flash-attn (required for BOTH training and inference!)
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
🤖 Using TimeLens Models
TimeLens models are a family of MLLMs with SotA video temporal grounding performance. They are built upon the Qwen2.5-VL and Qwen3-VL baselines through training on our high-quality TimeLens-100K dataset, leveraging our carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy.
🚀 Quick Start
All models are available on Hugging Face and support out-of-the-box inference using the 🤗Transformers library. For detailed usage instructions and code examples, please refer to the specific model's Hugging Face page linked below.
🏆 Model Zoo & Performance
The following table lists our models with their Hugging Face links and grounding performance:
<table> <thead> <tr> <th rowspan="2" align="center">Model <br>(with 🤗HuggingFace Link)</th> <th colspan="4" align="center">Charades-TimeLens</th> <th colspan="4" align="center">ActivityNet-TimeLens</th> <th colspan="4" align="center">QVHighlights-TimeLens</th> </tr> <tr> <th align="center">R1<br>@0.3</th> <th align="center">R1<br>@0.5</th> <th align="center">R1<br>@0.7</th> <th align="center">mIoU</th> <th align="center">R1<br>@0.3</th> <th align="center">R1<br>@0.5</th> <th align="center">R1<br>@0.7</th> <th align="center">mIoU</th> <th align="center">R1<br>@0.3</th> <th align="center">R1<br>@0.5</th> <th align="center">R1<br>@0.7</th> <th align="center">mIoU</th> </tr> </thead> <tbody> <tr> <td><a href="https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct">Qwen2.5-VL-7B-Instruct</a></td> <td align="center">59.7</td> <td align="center">37.8</td> <td align="center">16.6</td> <td align="center">39.3</td> <td align="center">44.1</td> <td align="center">31.0</td> <td align="center">16.1</td> <td align="center">31.4</td> <td align="center">41.5</td> <td align="center">27.8</td> <td align="center">15.2</td> <td align="center">31.6</td> </tr> <tr> <td><a href="https://huggingface.co/TencentARC/TimeLens-7B"><b>TimeLens-7B</b>🚀</a></td> <td align="center"><b>70.5</b></td> <td align="center"><b>55.6</b></td> <td align="center"><b>28.4</b></td> <td align="center"><b>48.8</b></td> <td align="center"><b>62.8</b></td> <td align="center"><b>51.0</b></td> <td align="center"><b>32.6</b></td> <td align="center"><b>46.2</b></td> <td align="center"><b>74.1</b></td> <td align="center"><b>62.7</b></td> <td align="center"><b>43.1</b></td> <td align="center"><b>56.0</b></td> </tr> <tr> <td><a href="https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct">Qwen3-VL-8B-Instruct</a></td> <td align="center">69.2</td> <td align="center">53.4</td> <td align="center">27.5</td> <td align="center">48.3</td> <td align="center">62.1</td> <td align="center">51.2</td> <td align="center">34.4</td> <td align="center">46.8</td> <td align="center">74.2</td> <td align="center">64.6</td> <td align="center">49.3</td> <td align="center">59.4</td> </tr> <tr> <td><a href="https://huggingface.co/TencentARC/TimeLens-8B"><b>TimeLens-8B</b>🚀</a></td> <td align="center"><b>76.6</b></td> <td align="center"><b>63.0</b></td> <td align="center"><b>35.2</b></td> <td align="center"><b>55.2</b></td> <td align="center"><b>68.9</b></td> <td align="center"><b>58.4</b></td> <td align="center"><b>40.6</b></td> <td align="center"><b>53.2</b></td> <td align="center"><b>80.2</b></td> <td align="center"><b>71.6</b></td> <td align="center"><b>55.5</b></td> <td align="center"><b>65.5</b></td> </tr> </tbody> </table>TimeLens-7B is fine-tuned from Qwen2.5-VL-7B-Instruct, and TimeLens-8B is fine-tuned from Qwen3-VL-8B-Instruct.
[!NOTE] For detailed comparison with other models, please refer to the 🏆 Leaderboard.
📊 Evaluation on TimeLens-Bench
Download TimeLens-Bench
Download the TimeLens-Bench dataset from Hugging Face and place it in the data/TimeLens-Bench directory:
hf download TencentARC/TimeLens-Bench \
--repo-type=dataset \
--local-dir data/TimeLens-Bench
Extract the compressed videos:
mkdir -p data/TimeLens-Bench/videos
find data/TimeLens-Bench/video_shards -name "*.tar.gz" | \
xargs -P 4 -I {} tar -xzf {} -C data/TimeLens-Bench/videos # Parallel extraction with 4 processes
The folder structure should look like this:
TimeLens/
└── data/
└── TimeLens-Bench/
├── activitynet-timelens.json
├── charades-timelens.json
├── qvhighlights-timelens.json
├── videos/ # extracted videos
│ ├── activitynet/
│ ├── charades/
│ └── qvhighlights/
└── video_shards/ # compressed videos (can be deleted after extraction)
Evaluate with Our Codebase (TimeLens / Qwen-VL Models)
Our codebase supports evaluation of the following models:
| Model | Supported | |:----------:|:---------:| | TimeLens-7B | ✅ | | TimeLens-8B | ✅ | | Qwen2.5-VL | ✅ | | Qwen3-VL | ✅ |
The evaluation script is scripts/eval_timelens_bench.sh. You can set the following environment variables:
model_path: Path or HuggingFace ID of the model to evaluate. Default:TencentARC/TimeLens-8Bdatasets: Comma-separated list of datasets to evaluate. Default:charades-timelens,activitynet-timelens,qvhighlights-timelensCUDA_VISIBLE_DEVICES: GPU indices to use (e.g.,0,1,2,3). Default: Auto-detect all available GPUspred_path: Directory to save results. Default:./logsmin_tokens: Minimum tokens for video encoding. Default:64total_tokens: Total tokens for video encoding. Default:14336FPS: Frames per second for video sampling. Default:2
Example 1: Evaluate TimeLens-8B (default settings)
model_path="TencentARC/TimeLens-8B" bash scripts/eval_timelens_bench.sh
Example 2: Evaluate TimeLens-7B on specific datasets with specific GPUs
CUDA_VISIBLE_DEVICES=0,1 \
datasets="activitynet-timelens,qvhighlights-timelens" \
model_path="TencentARC/TimeLens-7B" \
bash scripts/eval_timelens_bench.sh
Example 3: Evaluate Qwen3-VL with a local model path and a custom path to save results:
pred_path="./path/to/results"
