SkillAgentSearch skills...

MINED

【ACL 2026 🔥】Temporal Awareness Evaluation, Comprehensive Benchmarking, and Multi-Dimensional Analysis!

Install / Use

/learn @MINED-LMM/MINED
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<h1 align="center"> <a href="https://arxiv.org/pdf/2510.19457">MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models</a></h1> <h5 align="center">

arXiv Dataset code website Slides

</h5>

Table of Contents

🤗MINED

Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal pre-training, yet their static representations struggle to maintain an accurate understanding of time-sensitive factual knowledge. Existing benchmarks remain constrained by static designs, inadequately evaluating LMMs' ability to understand time-sensitive knowledge. To address this gap, we propose <span style="font-weight: bold; color: #2E7D32;">MINED</span>, a comprehensive benchmark that evaluates temporal awareness along <b>6</b> key dimensions and <b>11</b> challenging tasks: <b>cognition, awareness, trustworthiness, understanding, reasoning, and robustness</b>. MINED is constructed from Wikipedia by two professional annotators, containing <b>2,104</b> time-sensitive knowledge samples spanning six knowledge types. Evaluating 15 widely used LMMs on MINED shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07, while most open-source LMMs still lack time understanding ability. Meanwhile, LMMs perform best on organization knowledge, whereas their performance is weakest on sport. To address these challenges, we investigate the feasibility of updating time-sensitive knowledge in LMMs through knowledge editing methods and observe that LMMs can effectively update knowledge via knowledge editing methods in single editing scenarios.

<div align="center"> <img src="figs\overview.png" width="700px"> </div>

You can download data 🤗 Huggingface Dataset. And the expected structure of files is:

MINED
|-- 
inference_data (json/jsonl)
|   |-- Dimension1_time_agnostic.json
|   |-- Dimension1_temporal_interval.json
|   |-- Dimension1_time_agnostic.json
|   |-- Dimension2_awareness_future.json
|   |-- Dimension2_awareness_past.json
|   |-- Dimension3_future_unanswerable_date.json
|   |-- Dimension3_previous_unanswerable_date.json
|   |-- Dimension4_understanding.json
|   |-- Dimension5_calculation.json
|   |-- Dimension5_ranking.json
|   |-- Dimension6_robustness.json
|-- imgs
|   |-- MINED_Image.zip

🎯Main Results

<div align="center"> <img src="figs\results.png" width="700px"> </div>

🛠️Requirements and Installation

You can refer to https://github.com/open-compass/VLMEvalKit.git
<div align="center"> <img src="figs\install.png" width="700px"> </div>

💥Inference

python inference.py \
    --meta_save_path ./path/output \
    --model_name {base_model_name} \
    --data_eval_type {data_eval_type} \
    --max_new_token 10 \
    --image_path_prefix ./path/image_data

model_name refers to the model name defined in the VLMEvalKit\vlmeval\config.py file.

<details> <summary><b>data_eval_type options (click to expand)</b></summary>
  • time_agnostic: Knowledge understanding independent of time
  • timestamp: Reasoning about facts at a specific time point
  • temporal_interval: Reasoning about facts/states within a time interval
  • awareness_future: Future temporal awareness and prediction consistency
  • awareness_past: Past temporal awareness and retrospective consistency
  • future_unanswerable_date: Unanswerable queries concerning future dates
  • previous_unanswerable_date: Unanswerable queries concerning past dates
  • ranking: Ordering/comparison based on time-sensitive attributes
  • understanding: Understanding complex temporal semantics and inference
  • calculation: Date/time-related arithmetic and derivation
  • robustness: Robustness to temporal perturbations and phrasing variations
</details>

🤖Evaluation

Evaluate MINED

python eval_code\cem_f1.py

📊Customize inference data and task instructions

You can customize task instructions in the inferrence.py file to complete the corresponding tasks.

<div align="center"> <img src="figs\instruction.png" width="700px"> </div>

Custom data only needs to match the image and text pairs.

🤝 Acknowledgments

We thank the following open-source projects for making this work possible:

📝 Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :)

@article{jiang2025mined,
  title = {MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models},
  author={Jiang, Kailin and Jiang, Ning and Ren, Yuchen and Li, Yuchen and Gao, Yifan and Bi, Jinhe and Ma, Yunpu and Liu, Qingqing and Wang, Xianhao and Jia, Yifan and Jiang, Hongbo and Hu, Yaocong and Li, Bin and Liu, Lei and Du, Yuntao},
  year = {2025}
  url = {https://arxiv.org/pdf/2510.19457}
}
View on GitHub
GitHub Stars42
CategoryDevelopment
Updated10h ago
Forks1

Languages

Python

Security Score

75/100

Audited on Apr 7, 2026

No findings