SkillAgentSearch skills...

ReCEval

Supporting code for ReCEval paper

Install / Use

/learn @archiki/ReCEval
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

<img src="./assets/ReCEvalOverview.png" alt="teaser image" width="7500"/>

Dependencies

This code is written using PyTorch and HuggingFace's Transformer repo. Running ReCEval requires access to GPUs. The evaluation is quite light-weight, so one GPU should suffice. Please install Entailment Bank and GSM-8K datasets separately. For using human judgements datasets for GSM-8K and running baselines please follow the setup procedure in ROSCOE (preferably in a separate environment).

Installation

The simplest way to run our code is to start with a fresh environment.

conda create -n ReCEval python=3.9
source activate ReCEval
pip install -r requirements.txt

Running Evaluation

  • evaluate_receval.py contains the implementation of metrics in ReCEval.
  • train_*_pvi.py scripts are used to train models for the PVI-based metrics.
  • perturb_EB.py applies perturbations to the reasoning trees in Entailment Bank.
  • run_flan.py is used to obtain chain of thought responses from the GSM-8K dataset.
  • To compute metrics and evaluate, simply run python evaluate_receval.py (with default Entailment Bank). Default model and data directories can directly be changed within the script. These variables include:
    • inp_model_dir: Model g for calculating PVI-based intra-step correctness
    • inp_model_dir: Model g' for calculating PVI-based intra-step correctness
    • info_model_dir: Model for calculating PVI-based information-gain
    • source_path: Path containing reasoning chains to be scored or meta-evaluated
  • PVI Models: Here is a link for trained PVI models for entailment. For more training details and how we prepare the data refer to Appendix A of our paper and/or consider using off-the-shelf LLMs to compute ReCEval metrics.

Reference

Please cite our paper if you use our repository in your works:


@article{Prasad2023ReCEval,
  title         = {ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness},
  author        = {Archiki Prasad and Swarnadeep Saha and Xiang Zhou and Mohit Bansal},
  year          = {2023},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  eprint        = {2304.10703}
}

Related Skills

View on GitHub
GitHub Stars32
CategoryCustomer
Updated1d ago
Forks5

Languages

Python

Security Score

90/100

Audited on Apr 5, 2026

No findings