SkillAgentSearch skills...

Timo

Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)

Install / Use

/learn @zhaochen0110/Timo

README

TIMO 🌱

This repository contains the code, data, and models for the paper "TIMO: Towards Better Temporal Reasoning for Language Models", accepted at COLM 2024.

Table of Contents

📌 Introduction

We introduce TIMO 🌱, a series of open-source large language models (LLMs) designed for temporal reasoning. TIMO models are trained on self-generated temporal preference pairs and optimized with a novel self-critic temporal optimization method, enabling the models to excel in both temporal reasoning and general tasks. TIMO is the new state-of-the-art for temporal reasoning across 19 tasks while maintaining robust general task performance.

🚀 Models

Our models are available on Hugging Face:

📊 Datasets

We have uploaded all datasets used in various stages of training to Hugging Face. You can access them via the links below:

🌟 Highlights

TIMO achieves state-of-the-art results in temporal reasoning tasks. Here are the key results for 7B and 13B models:

7B Parameter Model

| Model | Math-time Avg | Pure-time Avg | Average | | ----------- | ------------- | ------------- | -------- | | Timo | 64.4 | 78.07 | 72.7 | | MAmmoTH | 57.08 | 62.71 | 60.0 | | WizardMath | 58.8 | 61.26 | 59.9 | | CodeLlama | 54.55 | 64.10 | 59.8 | | LLaMA2 | 57.65 | 66.30 | 62.7 | | WizardCoder | 53.05 | 59.83 | 57.8 | | ToRA | 51.03 | 65.71 | 58.2 | | TimeLLaMA | 48.3 | 29.0 | 38.6 |

13B Parameter Model

| Model | Math-time Avg | Pure-time Avg | Average | | ----------- | ------------- | ------------- | -------- | | Timo | 72.83 | 82.97 | 78.3 | | MAmmoTH | 70.68 | 69.52 | 72.1 | | LLaMA2 | 66.18 | 70.42 | 70.7 | | WizardMath | 63.65 | 70.62 | 68.4 | | WizardCoder | 61.6 | 66.08 | 65.9 | | CodeLlama | 63.55 | 67.05 | 65.7 | | ToRA | 57.85 | 68.90 | 65.6 |

⚙️ Installation

Clone this repository and install the required dependencies:

git clone https://github.com/zhaochen0110/Timo.git
cd Timo
pip install -r requirements.txt

🛠️ Training and Inference

Quick Start

To quickly start using TIMO, run the following code:

from transformers import pipeline
pipeline = pipeline("text-generation", "Warrieryes/timo-7b-hf")

template = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{query}\n\n### Response:'''

query = "What is 08:32 AM - 04:28?\n (A) 6:10 AM\n (B) 2:49 AM\n (C) 6:17 AM\n (D) 4:04 AM"

input = template.format(query=query)

output = pipeline(input)[0]['generated_text']

print(output)

Large-scale Evaluation

To replicate the experimental results in our paper, run:

python inference.py \
    --model_path $model_path \
    --data_path $data_path \
    --excel_folder $excel_folder \
    --output_path $output_path 

Self-critic Temporal Preference Generation

We use the MAmmoTH project's code to train mathematical models. Then we use the following code to generate Temporal Preference pairs:

python generate.py \
    --model_path $model_path \
    --generate True \
    --train_data_path $train_data_path \
    --score True \
    --save_path $save_path

Temporal direct preference optimization

After generating preference pairs, we use Direct Preference Optimization (DPO) to train the model:

python tdpo.py \
    --model_name_or_path $model_name_or_path \
    --json_path $json_path \
    --output_dir $output_dir 

📜 License

This project is licensed under the Apache 2.0 license - see the LICENSE file for details.

🙏 Acknowledgements

This project is partly based on the work done in MAmmoTH. Special thanks to their authors for valuable contributions.

📖 Citation

Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.

@article{su2024timo,
  title={Timo: Towards Better Temporal Reasoning for Language Models},
  author={Su, Zhaochen and Zhang, Jun and Zhu, Tong and Qu, Xiaoye and Li, Juntao and Zhang, Min and Cheng, Yu},
  journal={arXiv preprint arXiv:2406.14192},
  year={2024}
}

Related Skills

View on GitHub
GitHub Stars25
CategoryDevelopment
Updated3mo ago
Forks4

Languages

Python

Security Score

77/100

Audited on Dec 10, 2025

No findings