CREAM

[NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Generate Convert Improve

Install / Use

/learn @bigai-nlco/CREAM

About this skill

Quality Score

0/100

README

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

</div>

Updates

(2024.09.26) Our Paper have been accepted by NeurIPS 2024🔥🔥.
(2024.06.11) Paper Release on Arxiv.

🚀 Overview

We propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices.

Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K).

To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the “Lost-in-the-Middle” problem faced by long-context LLMs.

Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with “Never Miss A Beat”.

⚙️ Installation

# clone project
git clone git@github.com:wutong4012/CREAM.git
cd CREAM

# create conda environment
conda create -n cream python=3.9
conda activate cream

# install requirements
pip install -r requirements.txt
conda install -c nvidia cuda-nvcc
pip install flash_attn-2.5.7+cu122torch2.2cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

# replace lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
"replace lm_eval folder"

💡 How to run

You can download all the finetune data and evaluation data from pile_4k_train, pile_val, ShareGPT_4k_train, ShareGPT_val, gov_report, proof-pile, book3, pg19_long, LongChat-Lines, Needle in a Haystack, LongBench

Attention: You have to modify the "root" path in every file in the scripts folder.

Train model

bash scripts/run_CREAM.sh 8 linear llama2 5946 CREAM

bash scripts/run_CREAM_chat.sh 8 linear llama2_chat 5946 CREAM

Evaluate model

bash scripts/eval_longchat_lines.sh 8 linear llama2 CREAM 1000

bash scripts/eval_lost_in_the_middle.sh 8 linear llama2 CREAM 1000

bash scripts/eval_needle.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_longbench.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_ppl.sh 8 linear llama2 CREAM 1000

bash scripts/eval_long_ppl.sh 64 linear llama2 CREAM 1000

bash scripts/eval_benchmark.sh 8 linear llama2 CREAM 1000

⚽ Evaluation Results

LongChat-Lines

Lost in the Middle

Needle in a Haystack

LongBench

Acknowledgement

Data / Code:

📜 Citation

Please cite our paper if you use CREAM in your work:

@inproceedings{wu2024cream,
    title={An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding},
    author={Wu, Tong and Zhao, Yanpeng and Zheng, Zilong},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
    volume = {37},
    year={2024}
}

Related Skills

node-connect

354.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。