Haloscope
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
Install / Use
/learn @deeplearning-wisc/HaloscopeREADME
HaloScope
This is the source code accompanying the NeurIPS'24 spotlight HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection by Xuefeng Du, Chaowei Xiao, and Yixuan Li
Ads
Checkout our ICML'23 work SCONE, ICLR'24 work SAL and a recent preprint on leveraging unlabeled data for OOD detection and VLM harmful prompt detection and if you are interested!
Requirements
pip install -r requirements.txt
Models Preparation
Please download the LLaMA-2 7b / 13b from here and OPT 6.7b / 13b models. Setup a local directory for saving the models:
mkdir models
And put the model checkpoints inside the folder.
Get LLM generations
Firstly, make a local directory for saving the LLM-generated answers, model-generated truthfulness ground truth, and features, etc.
mkdir save_for_eval
For TruthfulQA, please run:
CUDA_VISIBLE_DEVICES=0 python hal_det_llama.py --dataset_name tqa --model_name llama2_chat_7B --most_likely 1 --num_gene 1 --gene 1
- "most_likely" means whether you want to generate the most likely answers for testing (most_likely == 1) or generate multiple answers with sampling techniques for uncertainty estimation.
- "num_gene" is how many samples we generate for each question, for most_likely==1, num_gene should be 1 otherwise we set num_gene to 10.
- "dataset_name" can be chosen from tqa, coqa, triviaqa, tydiqa
- "model_name" can be chosen from llama2_chat_7B, and llama2_chat_13B
Please check section 4.1 implementation details in the paper for reference.
For OPT models, please run:
CUDA_VISIBLE_DEVICES=0 python hal_det_opt.py --dataset_name tqa --model_name opt-6.7b --most_likely 1 --num_gene 1 --gene 1
Get the ground truth for the LLM generations
Since there is no ground truth for the generated answers, we leverage rouge and BleuRT for getting a sense of whether the answer is true or false.
To download the Bleurt models, please refer to here and put the model to the ./models folder:
For TruthfulQA, please run:
CUDA_VISIBLE_DEVICES=0 python hal_det_llama.py --dataset_name tqa --model_name llama2_chat_7B --most_likely 1 --use_rouge 0 --generate_gt 1
- when "use_rouge" is 1, then we use rouge for determining the ground truth, otherwise we use BleuRT.
For OPT models, please run:
CUDA_VISIBLE_DEVICES=0 python hal_det_opt.py --dataset_name tqa --model_name opt-6.7b --most_likely 1 --use_rouge 0 --generate_gt 1
Hallucination detection
For TruthfulQA, please run:
CUDA_VISIBLE_DEVICES=0 python hal_det_llama.py --dataset_name tqa --model_name llama2_chat_7B --use_rouge 0 --most_likely 1 --weighted_svd 1 --feat_loc_svd 3
- "weighted_svd" denotes whether we need the weighting coeffcient by the singular values in the score.
- "feat_loc_svd" denotes which location in a transformer block do we extract the representations, 3 is block output, 2 is mlp output and 1 is attention head output.
For OPT models, please run:
CUDA_VISIBLE_DEVICES=0 python hal_det_opt.py --dataset_name tqa --model_name opt-6.7b --use_rouge 0 --most_likely 1 --weighted_svd 1 --feat_loc_svd 3
Citation
If you found any part of this code is useful in your research, please consider citing our paper:
@inproceedings{du2024haloscope,
title={ HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection},
author={Xuefeng Du and Chaowei Xiao and Yixuan Li},
booktitle={Advances in Neural Information Processing Systems},
year = {2024}
}
Additional notes
Note that this work can be sensitive to random seeds (It is set to 41 for all the experiments in our paper), which determines how the unlabeled/evaluation/test LLM generations are splitted and prepared. We recommend reporting average results across different random seeds in your experiments.
Related Skills
node-connect
354.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
