Deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Install / Use
/learn @hkust-nlp/DeitaREADME
Deita
<p align="center"> <img src="./assets/logo-final.png" width="600"> </p> <p align="center"> 🤗 <a href="https://huggingface.co/collections/hkust-nlp/deita-6569c198c174808d94cf5bd4">HF Repo</a> 📄 <a href="https://arxiv.org/abs/2312.15685">Paper</a> 📚 <a href="https://huggingface.co/datasets/hkust-nlp/deita-6k-v0">6K Data</a> 📚 <a href="https://huggingface.co/datasets/hkust-nlp/deita-10k-v0">10K Data</a> </p>Welcome to Deita (Data-Efficient Instruction Tuning for Alignment) Project!
We will continue to update, please stay tuned!
What is Deita?
Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).
It includes:
- Open-sourced Toolkits for automatic data selection in instruction tuning
- Deita Datasets: A series of extremely lightweight, high-quality alignment SFT data. We release 6k-sized and 10k-sized datasets in the first release
- Deita Models: A series of powerful models on par with SOTA chat LLMs with an extremely efficient instruction tuning Process. Deita models can be obained by training with 10x less instruction tuning data compared with other SOTA LLMs
News
- :fire: [03/2024] Our datasets have been used by Huggingface to creat the Zephyr Gemma Model.
- 📄 [01/2024] Deita paper What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning has been accepted by ICLR2024!
- :fire: [01/2024] Deita pipelines have been released! With one line code and configurations, a high-quality data subset for alignment can be selected.
- 📚 [01/2024] Our scorer datasets deita-complexity-scorer-data and deita-quality-scorer-data have been released.
- :fire: [12/2023] We release the first collection of the Deita resources here, which include a series of extremely lightweight, effective sft datasets, the data complexity/quality scorer models, as well as the resulted deita chat models.
Performance
:bell: Still curious about how far a small amount of high-quality data can lead LLMs?
Deita may provide an answer for you:
🔦 Highlights | Model | Align | Data Size | MT-Bench | AlpacaEval(%) | |------------------------------------------------|--------------|------------|----------|---------------| | Zephyr-7B-sft | SFT | 200K | 5.32 | 75.12 | | $\text{Zephyr-7B-}\beta$ | SFT + DPO | 200K SFT + 60K DPO | 7.34 | 90.60 | | OpenChat-3.5 | C-RLFT | >> 70K C-RLFT | 7.81 | 88.51 | | Starling-7B | C-RLFT + APA | >> 70K C-RLFT + 183K APA | 8.09 | 91.99 | | Tulu-2-13B | SFT | 326K | 6.70 | 78.90 | | Tulu-2-13B+DPO | SFT + DPO | 326K SFT + 60K DPO | 7.00 | 89.50 | | LLaMA2-13B-Chat | SFT + PPO | -- | 6.65 | 81.09 | | WizardLM-13B-v1.2 | SFT | >70K | 7.09 | 89.17 | | Vicuna-13B-v1.5 | SFT | >125K | 6.57 | 78.80 | | DEITA-7B-v1.0 (6K) | SFT | 6K | 7.22 | 80.78 | | DEITA-7B-v1.0-sft | SFT | 10K | 7.32 | 81.67 | | DEITA-7B-v1.0 | SFT + DPO | 6K SFT + 10K DPO | 7.55 | 90.06 |
DEITA models are based on Mistral-7B-v0.1. :fire:
Please refer to this table for full evaluations including Open LLM Leaderboard as well, which includes DEITA models with LLaMA base models and comparisons with other data selection approaches.
:chart_with_upwards_trend: Full Evaluations
<details> <summary>See full evaluations</summary>| Model | Align | Data Size | MT-Bench | AlpacaEval(%) | OpenLLM (Avg.) | |------------------------------------------------|-----------|------------|----------|---------------|----------------| | Proprietary Models | | | | | | | GPT-4-Turbo | ? | -- | 9.32 | 97.70 | -- | | GPT-4 | SFT + PPO | -- | 8.99 | 95.03 | -- | | Claude-2 | SFT + PPO | -- | 8.06 | 91.36 | -- | | GPT-3.5-turbo | SFT + PPO | -- | 7.94 | 89.37 | -- | | Open-sourced Models based on LLaMA-1-13B | | | | | | | LIMA | SFT | 1K SFT | 4.29 | 41.98 | 59.82 | | WizardLM-13B | SFT | 70K SFT | 6.35 | 75.31 | 58.96 | | Vicuna-13B-v1.3 | SFT | 125K SFT | 6.39 | 82.11 | 60.01 | | Random | SFT | 10K SFT | 6.03 | 71.52 | 60.14 | | DEITA-LLaMA1-13B-v1.0-sft | SFT | 10K SFT | 6.60 | 78.01 | 64.27 | | Open-sourced Models based on LLaMA-2-13B | | | | | | | Tulu-2-13B | SFT | 326K SFT | 6.70 | 78.90 | -- | | Tulu-2-13B+DPO | SFT + DPO | 326K SFT + 60K DPO | 7.00 | 89.50 | -- | | LLaMA2-13B-Chat | SFT + PPO | -- | 6.65 | 81.09 | -- | | WizardLM-13B-v1.2 | SFT | >70K SFT | 7.09 | 89.17 | -- | | Vicuna-13B-v1.5 | SFT | 125K SFT | 6.57 | 78.80 | 61.63 | | Random | SFT | 10K SFT | 5.78 | 65.19 | 61.32 | | DEITA-LLaMA2-13B-v1.0-sft | SFT | 10K SFT | 6.79 | 81.09 | 62.71 | | Open-sourced Models based on Mistral-7B | | | | | | | Mistral-7B-Instruct-v0.1 | -- | -- | 6.84 | 69.65 | 60.45 | | Zephyr-7B-sft | SFT | 200K SFT | 5.32 | 75.12 | 60.93 | | $\text{Zephyr-7B-}\beta$ | SFT + DPO | 200K SFT + 60K DPO | 7.34 | 90.60 | 66.36 | | OpenChat-3.5 | C-RLFT | >> 70K C-RLFT | 7.81 | 88.51 | -- | | Starling-7B | C-RLFT + APA | >>70K C-RLFT + 183K APA | 8.09 | 91.99 | -- | | Random | SFT | 10K SFT | 5.89 | 56.90 | 61.72 | | DEITA-7B-v1.0-sft (6K) | SFT | 6K SFT | 7.22 | 80.78 | 64.94 | | DEITA-7B-v1.0-sft (10K) | SFT | 10K SFT | 7.32 | 81.67 | 64.00 | | DEITA-7B-v1.0 | SFT + DPO | 6K SFT + 10K DPO | 7.55 | 90.06 | 69.86 |
</details>:rocket: Deita Resources
| Resource | Link | License | |------------------------------------------------|-----------|------------| | Deita Datasets | | | | deita-6k-v0 | :hugs: HF Repo | MIT License | | deita-10k-v0 | :hugs: HF Repo | MIT License | | deita-complexity-scorer-data | :hugs: HF Repo | MIT License | | deita-quality-scorer-data | :hugs: HF Repo | MIT License | | deita-redundant-pool (100K) | :hugs: HF Repo | MIT License | | deita-sota-pool (300K) | :hugs: HF Repo | MIT License | | Scorers | | | | deita-complexity-scorer | :hugs: HF Repo | [LLaMA License](https://ai.meta.com/resources/models-and-libr
