Timo
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
Install / Use
/learn @zhaochen0110/TimoREADME
TIMO 🌱
This repository contains the code, data, and models for the paper "TIMO: Towards Better Temporal Reasoning for Language Models", accepted at COLM 2024.
Table of Contents
- 📌 Introduction
- 🚀 Models
- 📊 Datasets
- 🌟 Highlights
- ⚙️ Installation
- 🛠️ Training and Inference
- 📜 License
- 🙏 Acknowledgements
- 📖 Citation
📌 Introduction
We introduce TIMO 🌱, a series of open-source large language models (LLMs) designed for temporal reasoning. TIMO models are trained on self-generated temporal preference pairs and optimized with a novel self-critic temporal optimization method, enabling the models to excel in both temporal reasoning and general tasks. TIMO is the new state-of-the-art for temporal reasoning across 19 tasks while maintaining robust general task performance.
🚀 Models
Our models are available on Hugging Face:
📊 Datasets
We have uploaded all datasets used in various stages of training to Hugging Face. You can access them via the links below:
- Default pure-time datasets: TRAM-Temporal
- Temporal preference pairs generated by LLaMA2-7B: TRAM-Temporal-DPO-7B
- Temporal preference pairs generated by LLaMA2-13B: TRAM-Temporal-DPO-13B
🌟 Highlights
TIMO achieves state-of-the-art results in temporal reasoning tasks. Here are the key results for 7B and 13B models:
7B Parameter Model
| Model | Math-time Avg | Pure-time Avg | Average | | ----------- | ------------- | ------------- | -------- | | Timo | 64.4 | 78.07 | 72.7 | | MAmmoTH | 57.08 | 62.71 | 60.0 | | WizardMath | 58.8 | 61.26 | 59.9 | | CodeLlama | 54.55 | 64.10 | 59.8 | | LLaMA2 | 57.65 | 66.30 | 62.7 | | WizardCoder | 53.05 | 59.83 | 57.8 | | ToRA | 51.03 | 65.71 | 58.2 | | TimeLLaMA | 48.3 | 29.0 | 38.6 |
13B Parameter Model
| Model | Math-time Avg | Pure-time Avg | Average | | ----------- | ------------- | ------------- | -------- | | Timo | 72.83 | 82.97 | 78.3 | | MAmmoTH | 70.68 | 69.52 | 72.1 | | LLaMA2 | 66.18 | 70.42 | 70.7 | | WizardMath | 63.65 | 70.62 | 68.4 | | WizardCoder | 61.6 | 66.08 | 65.9 | | CodeLlama | 63.55 | 67.05 | 65.7 | | ToRA | 57.85 | 68.90 | 65.6 |
⚙️ Installation
Clone this repository and install the required dependencies:
git clone https://github.com/zhaochen0110/Timo.git
cd Timo
pip install -r requirements.txt
🛠️ Training and Inference
Quick Start
To quickly start using TIMO, run the following code:
from transformers import pipeline
pipeline = pipeline("text-generation", "Warrieryes/timo-7b-hf")
template = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{query}\n\n### Response:'''
query = "What is 08:32 AM - 04:28?\n (A) 6:10 AM\n (B) 2:49 AM\n (C) 6:17 AM\n (D) 4:04 AM"
input = template.format(query=query)
output = pipeline(input)[0]['generated_text']
print(output)
Large-scale Evaluation
To replicate the experimental results in our paper, run:
python inference.py \
--model_path $model_path \
--data_path $data_path \
--excel_folder $excel_folder \
--output_path $output_path
Self-critic Temporal Preference Generation
We use the MAmmoTH project's code to train mathematical models. Then we use the following code to generate Temporal Preference pairs:
python generate.py \
--model_path $model_path \
--generate True \
--train_data_path $train_data_path \
--score True \
--save_path $save_path
Temporal direct preference optimization
After generating preference pairs, we use Direct Preference Optimization (DPO) to train the model:
python tdpo.py \
--model_name_or_path $model_name_or_path \
--json_path $json_path \
--output_dir $output_dir
📜 License
This project is licensed under the Apache 2.0 license - see the LICENSE file for details.
🙏 Acknowledgements
This project is partly based on the work done in MAmmoTH. Special thanks to their authors for valuable contributions.
📖 Citation
Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.
@article{su2024timo,
title={Timo: Towards Better Temporal Reasoning for Language Models},
author={Su, Zhaochen and Zhang, Jun and Zhu, Tong and Qu, Xiaoye and Li, Juntao and Zhang, Min and Cheng, Yu},
journal={arXiv preprint arXiv:2406.14192},
year={2024}
}
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
