SkillAgentSearch skills...

PeerQA

Code and Data for PeerQA: A Scientific Question Answering Dataset from Peer Reviews, NAACL 2025 https://aclanthology.org/2025.naacl-long.22/

Install / Use

/learn @UKPLab/PeerQA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PeerQA: A Scientific Question Answering Dataset from Peer Reviews

<div align="center">

ACL GitHub HuggingFace Video Slides Poster

</div> <img src="./peer-qa-overview-with-note.png" align="right" width="275" style="padding: 10px"> We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens.

News

2025/04/24 Outstanding Paper at NAACL - PeerQA won the Outstanding Paper Award at NAACL 2025!

2025/03/04 HuggingFace Datasets - The PeerQA dataset is now available on HuggingFace Datasets! Find detailed instructions on https://huggingface.co/datasets/UKPLab/PeerQA.

2025/02/25 New DeepSeek-R1 & Cross-Encoder Results - We add results for the distilled DeepSeek-R1 models: Llama 8B, Qwen 7B, Qwen 14B and Qwen 32B for the answerability, and answer generation tasks. We furhter evaluate a new set of Dense and Cross-Encoder Reranker models. See Additional Results for more details.

2025/02/19 PeerQA Dataset Released - The PeerQA preprint is now available on Arxiv, as well as the code and data on GitHub.

2025/01/23 PeerQA accepted at NAACL - The PeerQA paper has been accepted to NAACL 2025. The paper will be presented at the conference in April/May 2025.

Setup

To run the experiments, you need to install the following dependencies:

  • GROBID 0.8
  • Java 21 (for BM25 retrieval experiments with pyserini)
  • uv To set up the environment, you can use the following commands:
# download python version with uv
uv python install 3.10
# create a virtual environment
uv venv .venv
# activate the virtual environment
source .venv/bin/activate
# install the required python packages
uv pip install .

To process the data locally, you need to run GROBID 0.8.0. By default, the preprocessing script will use GROBID hosted on HuggingFace's Spaces (https://timbmg-peerqa-grobid-0-8-0.hf.space).

For the BM25 experiments with Pyserini you need:

  • Java 21

Data & Preprocessing

This section describes how to download the data from the different sources and how to preprocess it for the experiments.

Questions

  1. Create a new directory data and download and unzip the questions into it

Linux/Mac

mkdir data && cd data && curl -L 'https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/4467/peerqa-data-v1.0.zip?sequence=1&isAllowed=y' -o peerqa-data-v1.0.zip && unzip peerqa-data-v1.0.zip && rm peerqa-data-v1.0.zip && cd ..

Windows

mkdir data
cd data
Invoke-WebRequest -Uri 'https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/4467/peerqa-data-v1.0.zip?sequence=1&isAllowed=y' -OutFile 'peerqa-data-v1.0.zip'
Expand-Archive -LiteralPath '.\peerqa-data-v1.0.zip' -DestinationPath '.'
Remove-Item 'peerqa-data-v1.0.zip'
cd ..

Papers

To adhere to the licenses of the papers, we cannot provide the papers directly. Instead, we provide the steps to download the papers from the respective sources and extract the text from them.

Download OpenReview PDFs and Extract Text

  1. Download PDFs from OpenReview for ICLR 2022, ICLR 2023, NeurIPS:
uv run download_openreview.py
  1. Extract the text from the PDFs to add OpenReview PDF texts to data/papers.jsonl. The text is extracted from the PDF with GROBID 0.8.0. By default the script will use the GROBID server hosted on HuggingFace spaces at https://timbmg-peerqa-grobid-0-8-0.hf.space. However, you can also run the GROBID server locally via docker: docker run -p 8070:8070 lfoppiano/grobid:0.8.0. To use the local server, set the --grobid_server argument to http://localhost:8070. Otherwise, the script will use the HuggingFace server. To now extract the text from the PDFs, run:
uv run extract_text_from_pdf.py

Now, the data is ready for the experiments.

Data

Once the download and preprocessing steps are completed, the following files should be present in the data directory:

  • papers.jsonl
  • qa.jsonl
  • qa-augmented-answers.jsonl
  • qa-unlabeled.jsonl

Paper Data

| Key | Type | Description | | ------------ | ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | idx | int | The index of the paper in the dataset | | pidx | int | The index of the paragraph in the paper | | sidx | int | The index of the sentence in the paragraph | | type | str | The type of the content (e.g., title, heading, caption) | | content | str | The content of the paragraph | | last_heading | str | The last heading before the paragraph | | paper_id | str | The unique identifier of the paper, where the first part is the source of the paper (e.g., openreview, egu, nlpeer) and the second part is the venue (e.g. ICLR-2022-conf, ESurf, ESD), and the third part is a unique identifier for the paper |

QA Data

| Key | Type | Description | | ---------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | paper_id | str | The unique identifier of the paper; see above for composition | | question_id | str | The unique identifier of the question | | question | str | The question | | raw_answer_evidence | List[str] | The raw evidence that has been high

Related Skills

View on GitHub
GitHub Stars13
CategoryDevelopment
Updated9d ago
Forks4

Languages

Python

Security Score

95/100

Audited on Mar 30, 2026

No findings