LoGU
Source code for our paper: "LoGU: Long-form Generation with Uncertainty Expressions".
Install / Use
/learn @rhyang2021/LoGUREADME
LoGU: Long-form Generation with Uncertainty Expressions
<div> <a href='https://scholar.google.com/citations?user=asTSVwQAAAAJ&hl=en' target='_blank'><b>Ruihan Yang</b></a><sup>1</sup>  <a href='https://caiqizh.github.io/' target='_blank'><b>Caiqi Zhang</b></a><sup>2</sup>  <a href='https://scholar.google.co.jp/citations?user=373vlUEAAAAJ&hl=en' target='_blank'><b>Zhisong Zhang</b></a><sup>3</sup>  </div> <div><sup>1</sup>Fudan University</div> <div><sup>2</sup>University of Cambridge</div> <div><sup>3</sup>Tencent AI Lab</div> <div> <h4>
<img src="https://img.shields.io/badge/Version-1.0-blue.svg" alt="Version">
<img src="https://img.shields.io/github/stars/rhyang2021/LoGU?color=yellow" alt="Stars">
<img src="https://img.shields.io/github/issues/rhyang2021/LoGU?color=red" alt="Issues">
Introduction
While Large Language Models (LLMs) demonstrate impressive capabilities, they still struggle with generating factually incorrect content (i.e., hallucinations). A promising approach to mitigate this issue is enabling models to express uncertainty when unsure. Previous research on uncertainty modeling has primarily focused on short-form QA, but real-world applications often require much longer responses. In this work, we introduce the task of Long-form Generation with Uncertainty (LoGU). We identify two key challenges: Uncertainty Suppression, where models hesitate to express uncertainty, and Uncertainty Misalignment, where models convey uncertainty inaccurately.
To tackle these challenges, we propose a refinement-based data collection framework and a two-stage training pipeline. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. The collected data are then used in training through supervised fine-tuning (SFT) and direct preference optimization (DPO) to enhance uncertainty expression. Extensive experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.
<div align="center"> <img width="825" alt="image" src="./figures/main_png.png"> </div>How to Install
You can use the following commands to install the environment for LoGU:
conda create -n LoGU python==3.8
conda activate LoGU
pip install -r lf_requirements.txt
pip install -r vllm_requirements.txt
Run
Try the following command to test our method on Bios, LongFact, WildHallu:
- Generate answers
cd ./scripts
bash generate_vllm_responses.sh
- Calculate Factual Accuracy(FA)
bash eval_pipeline.sh
- Calculate Uncertain Precision(UC)
bash generate_unc_answers.sh
bash factcheck_unc_answers.sh
Training Data
Training data for LoGU-SFT and LoGU-DPO in the paper can be found here.
Models
We also provide some uncertainty expression models on the huggingface model hub for fast trail:
| Model | Link | | :------- | :---------: | | rhyang2021/uncertain_llama3_8b | HuggingFace| | rhyang2021/uncertain_mistral_7b | HuggingFace|
If you have any questions, please feel free to email me or drop me an issue.
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
