SkillAgentSearch skills...

Deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Install / Use

/learn @hkust-nlp/Deita
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Deita

<p align="center"> <img src="./assets/logo-final.png" width="600"> </p> <p align="center"> 🤗 <a href="https://huggingface.co/collections/hkust-nlp/deita-6569c198c174808d94cf5bd4">HF Repo</a>&nbsp;&nbsp;&nbsp; 📄 <a href="https://arxiv.org/abs/2312.15685">Paper</a>&nbsp;&nbsp;&nbsp; 📚 <a href="https://huggingface.co/datasets/hkust-nlp/deita-6k-v0">6K Data</a>&nbsp;&nbsp;&nbsp; 📚 <a href="https://huggingface.co/datasets/hkust-nlp/deita-10k-v0">10K Data</a> </p>

Welcome to Deita (Data-Efficient Instruction Tuning for Alignment) Project!

We will continue to update, please stay tuned!

What is Deita?

Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).

It includes:

  • Open-sourced Toolkits for automatic data selection in instruction tuning
  • Deita Datasets: A series of extremely lightweight, high-quality alignment SFT data. We release 6k-sized and 10k-sized datasets in the first release
  • Deita Models: A series of powerful models on par with SOTA chat LLMs with an extremely efficient instruction tuning Process. Deita models can be obained by training with 10x less instruction tuning data compared with other SOTA LLMs

News

Performance

:bell: Still curious about how far a small amount of high-quality data can lead LLMs?

Deita may provide an answer for you:

🔦 Highlights | Model | Align | Data Size | MT-Bench | AlpacaEval(%) | |------------------------------------------------|--------------|------------|----------|---------------| | Zephyr-7B-sft | SFT | 200K | 5.32 | 75.12 | | $\text{Zephyr-7B-}\beta$ | SFT + DPO | 200K SFT + 60K DPO | 7.34 | 90.60 | | OpenChat-3.5 | C-RLFT | >> 70K C-RLFT | 7.81 | 88.51 | | Starling-7B | C-RLFT + APA | >> 70K C-RLFT + 183K APA | 8.09 | 91.99 | | Tulu-2-13B | SFT | 326K | 6.70 | 78.90 | | Tulu-2-13B+DPO | SFT + DPO | 326K SFT + 60K DPO | 7.00 | 89.50 | | LLaMA2-13B-Chat | SFT + PPO | -- | 6.65 | 81.09 | | WizardLM-13B-v1.2 | SFT | >70K | 7.09 | 89.17 | | Vicuna-13B-v1.5 | SFT | >125K | 6.57 | 78.80 | | DEITA-7B-v1.0 (6K) | SFT | 6K | 7.22 | 80.78 | | DEITA-7B-v1.0-sft | SFT | 10K | 7.32 | 81.67 | | DEITA-7B-v1.0 | SFT + DPO | 6K SFT + 10K DPO | 7.55 | 90.06 |

DEITA models are based on Mistral-7B-v0.1. :fire:

Please refer to this table for full evaluations including Open LLM Leaderboard as well, which includes DEITA models with LLaMA base models and comparisons with other data selection approaches.

:chart_with_upwards_trend: Full Evaluations

<details> <summary>See full evaluations</summary>

| Model | Align | Data Size | MT-Bench | AlpacaEval(%) | OpenLLM (Avg.) | |------------------------------------------------|-----------|------------|----------|---------------|----------------| | Proprietary Models | | | | | | | GPT-4-Turbo | ? | -- | 9.32 | 97.70 | -- | | GPT-4 | SFT + PPO | -- | 8.99 | 95.03 | -- | | Claude-2 | SFT + PPO | -- | 8.06 | 91.36 | -- | | GPT-3.5-turbo | SFT + PPO | -- | 7.94 | 89.37 | -- | | Open-sourced Models based on LLaMA-1-13B | | | | | | | LIMA | SFT | 1K SFT | 4.29 | 41.98 | 59.82 | | WizardLM-13B | SFT | 70K SFT | 6.35 | 75.31 | 58.96 | | Vicuna-13B-v1.3 | SFT | 125K SFT | 6.39 | 82.11 | 60.01 | | Random | SFT | 10K SFT | 6.03 | 71.52 | 60.14 | | DEITA-LLaMA1-13B-v1.0-sft | SFT | 10K SFT | 6.60 | 78.01 | 64.27 | | Open-sourced Models based on LLaMA-2-13B | | | | | | | Tulu-2-13B | SFT | 326K SFT | 6.70 | 78.90 | -- | | Tulu-2-13B+DPO | SFT + DPO | 326K SFT + 60K DPO | 7.00 | 89.50 | -- | | LLaMA2-13B-Chat | SFT + PPO | -- | 6.65 | 81.09 | -- | | WizardLM-13B-v1.2 | SFT | >70K SFT | 7.09 | 89.17 | -- | | Vicuna-13B-v1.5 | SFT | 125K SFT | 6.57 | 78.80 | 61.63 | | Random | SFT | 10K SFT | 5.78 | 65.19 | 61.32 | | DEITA-LLaMA2-13B-v1.0-sft | SFT | 10K SFT | 6.79 | 81.09 | 62.71 | | Open-sourced Models based on Mistral-7B | | | | | | | Mistral-7B-Instruct-v0.1 | -- | -- | 6.84 | 69.65 | 60.45 | | Zephyr-7B-sft | SFT | 200K SFT | 5.32 | 75.12 | 60.93 | | $\text{Zephyr-7B-}\beta$ | SFT + DPO | 200K SFT + 60K DPO | 7.34 | 90.60 | 66.36 | | OpenChat-3.5 | C-RLFT | >> 70K C-RLFT | 7.81 | 88.51 | -- | | Starling-7B | C-RLFT + APA | >>70K C-RLFT + 183K APA | 8.09 | 91.99 | -- | | Random | SFT | 10K SFT | 5.89 | 56.90 | 61.72 | | DEITA-7B-v1.0-sft (6K) | SFT | 6K SFT | 7.22 | 80.78 | 64.94 | | DEITA-7B-v1.0-sft (10K) | SFT | 10K SFT | 7.32 | 81.67 | 64.00 | | DEITA-7B-v1.0 | SFT + DPO | 6K SFT + 10K DPO | 7.55 | 90.06 | 69.86 |

</details>

:rocket: Deita Resources

| Resource | Link | License | |------------------------------------------------|-----------|------------| | Deita Datasets | | | | deita-6k-v0 | :hugs: HF Repo | MIT License | | deita-10k-v0 | :hugs: HF Repo | MIT License | | deita-complexity-scorer-data | :hugs: HF Repo | MIT License | | deita-quality-scorer-data | :hugs: HF Repo | MIT License | | deita-redundant-pool (100K) | :hugs: HF Repo | MIT License | | deita-sota-pool (300K) | :hugs: HF Repo | MIT License | | Scorers | | | | deita-complexity-scorer | :hugs: HF Repo | [LLaMA License](https://ai.meta.com/resources/models-and-libr

View on GitHub
GitHub Stars591
CategoryDevelopment
Updated16d ago
Forks35

Languages

Python

Security Score

100/100

Audited on Mar 14, 2026

No findings