SkillAgentSearch skills...

DoLa

Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"

Install / Use

/learn @voidism/DoLa
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

License: MIT Arxiv Hugging Face Transformers Tweet GitHub Stars

Open In Colab

Code for the ICLR 2024 paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"

Paper: https://arxiv.org/abs/2309.03883
Authors: Yung-Sung Chuang $^\dagger$, Yujia Xie $^\ddagger$, Hongyin Luo $^\dagger$, Yoon Kim $^\dagger$, James Glass $^\dagger$, Pengcheng He $^\ddagger$
$^\dagger$ Massachusetts Institute of Technology, $^\ddagger$ Microsoft

Overview

DoLa

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by contrasting Layers (DoLA) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLA consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.

Setup

pip install -e transformers-4.28.1
pip install datasets
pip install accelerate
pip install openai # -> only for truthfulqa and gpt4_eval

Experiments

Arguments

| Argument | Example | Description | | ----------------- | ----------------- | ------------- | | --model-name | huggyllama/llama-7b | Specifies the model you want to use, currently we only support LLaMA-v1. | | --data-path | /path/to/dataset | Path to the dataset file or folder. | | --output-path | output-path.json | Where to store the output results. | | --num-gpus | 1 | Number of GPUs to use, 1/2/4/8 for 7B/13B/30B/65B model sizes respectively. | | --max_gpu_memory| 27 | Maximum GPU memory size (in GiB) to allocate. Default: 27 (for 32G V100). |

Understanding --early-exit-layers

The --early-exit-layers argument takes a string containing a sequence of layer numbers separated by commas, with no spaces in between. By specifying different number of layers, we make the model decode at different modes.

| Number of Layers Specified | Example (str) | Description of Decoding Mode | | ---------------------------| ------------- | ----------------------------------------------------------------------------------------------- | | 1 | -1 | Naive decoding from the final layer output. | | 2 | 16,32 | DoLa-static decoding with the second specified layer (i.e. 32) as the mature_layer and first specified layer (i.e. 16) as premature_layer. | | >2 | 0,2,4,6,8,10,12,14,32 | DoLa decoding with the last specified layer (i.e. 32) as the mature_layer and all the preceding layers (i.e. 0,2,4,6,8,10,12,14) as candidate_premature_layers. |

FACTOR (Multiple Choices)

Please download the data file wiki_factor.csv from https://github.com/AI21Labs/factor

Baseline

python factor_eval.py --model-name huggyllama/llama-7b --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 1
python factor_eval.py --model-name huggyllama/llama-13b --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 2
python factor_eval.py --model-name huggyllama/llama-30b --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 4
python factor_eval.py --model-name huggyllama/llama-65b --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 8

DoLa

python factor_eval.py --model-name huggyllama/llama-7b --early-exit-layers 0,2,4,6,8,10,12,14,32 --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 1
python factor_eval.py --model-name huggyllama/llama-13b --early-exit-layers 0,2,4,6,8,10,12,14,16,18,40 --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 2
python factor_eval.py --model-name huggyllama/llama-30b --early-exit-layers 0,2,4,6,8,10,12,14,16,18,60 --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 4
python factor_eval.py --model-name huggyllama/llama-65b --early-exit-layers 0,2,4,6,8,10,12,14,16,18,80 --data-path /path/to/wiki_factor.csv --output-path output-path.json --num-gpus 8

TruthfulQA (Multiple Choices)

The --data-path should be a folder contains TruthfulQA.csv. If file not exists, it will be downloaded automatcially.

Baseline

python tfqa_mc_eval.py --model-name huggyllama/llama-7b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 1
python tfqa_mc_eval.py --model-name huggyllama/llama-13b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 2
python tfqa_mc_eval.py --model-name huggyllama/llama-30b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 4
python tfqa_mc_eval.py --model-name huggyllama/llama-65b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 8

DoLa

python tfqa_mc_eval.py --model-name huggyllama/llama-7b --early-exit-layers 16,18,20,22,24,26,28,30,32 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 1
python tfqa_mc_eval.py --model-name huggyllama/llama-13b --early-exit-layers 20,22,24,26,28,30,32,34,36,38,40 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 2
python tfqa_mc_eval.py --model-name huggyllama/llama-30b --early-exit-layers 40,42,44,46,48,50,52,54,56,58,60 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 4
python tfqa_mc_eval.py --model-name huggyllama/llama-65b --early-exit-layers 60,62,64,66,68,70,72,74,76,78,80 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 8

TruthfulQA

To evaluate the open-ended generation result of TruthfulQA, we need to finetune two GPT-3 curie models through OpenAI API:

openai api fine_tunes.create -t finetune_truth.jsonl -m curie --n_epochs 5 --batch_size 21 --learning_rate_multiplier 0.1
openai api fine_tunes.create -t finetune_info.jsonl -m curie --n_epochs 5 --batch_size 21 --learning_rate_multiplier 0.1

After finetuning, we can obtain the finetuned model names by openai api fine_tunes.list | grep fine_tuned_model.

Create a config file gpt3.config.json like this:

{"gpt_info": "curie:ft-xxxxxxxxxx",
"gpt_truth": "curie:ft-xxxxxxxxxx",
"api_key": "xxxxxxx"}

Add the argument --do-rating --gpt3-config gpt3.config.json for GPT-3 evaluation.

Baseline

python tfqa_eval.py --model-name huggyllama/llama-7b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 1 --do-rating --gpt3-config /path/to/gpt3.config.json
python tfqa_eval.py --model-name huggyllama/llama-13b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 2 --do-rating --gpt3-config /path/to/gpt3.config.json
python tfqa_eval.py --model-name huggyllama/llama-30b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 4 --do-rating --gpt3-config /path/to/gpt3.config.json
python tfqa_eval.py --model-name huggyllama/llama-65b --data-path /path/to/data/folder --output-path output-path.json --num-gpus 8 --do-rating --gpt3-config /path/to/gpt3.config.json

DoLa

python tfqa_eval.py --model-name huggyllama/llama-7b --early-exit-layers 16,18,20,22,24,26,28,30,32 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 1 --do-rating --gpt3-config /path/to/gpt3.config.json
python tfqa_eval.py --model-name huggyllama/llama-13b --early-exit-layers 20,22,24,26,28,30,32,34,36,38,40 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 2 --do-rating --gpt3-config /path/to/gpt3.config.json
python tfqa_eval.py --model-name huggyllama/llama-30b --early-exit-layers 40,42,44,46,48,50,52,54,56,58,60 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 4 --do-rating --gpt3-config /path/to/gpt3.config.json
python tfqa_eval.py --model-name huggyllama/llama-65b --early-exit-layers 60,62,64,66,68,70,72,74,76,78,80 --data-path /path/to/data/folder --output-path output-path.json --num-gpus 8 --do-rating --gpt3-config /path/to/gpt3.config.json

GSM8K

We use a random sample

View on GitHub
GitHub Stars545
CategoryDevelopment
Updated8d ago
Forks67

Languages

Python

Security Score

85/100

Audited on Mar 25, 2026

No findings