SkillAgentSearch skills...

RPC

Official Repository for NeurIPS 2025 Paper: "A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning"

Install / Use

/learn @WNJXYK/RPC
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

[NeurIPS 2025] A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Official Repository for NeurIPS 2025 Paper: "A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning"

<div align="center"> <a href="https://arxiv.org/pdf/2502.00511">📄 [Paper]</a> • <a href="https://wnjxyk.github.io/RPC">🌐 [Project]</a> • <a href="https://huggingface.co/collections/WNJXYK/mathematical-llm-reasoning-paths-68e4c4e32e3ad7fa0fcad77a">🤗 [Data Collection]</a> • <a href="https://huggingface.co/spaces/WNJXYK/RPC">💻 [Demo]</a> </div>

🛠️ 1. Environment Setup

We provide two ways to create the Python environment for this repository. Please choose one of the following methods:

1.1. Using Python virtual environment:

python -m venv rpc
source rpc/bin/activate
pip install -r requirements.txt 

1.2. Using Conda environment:

conda create -n rpc python=3.9
conda activate rpc
pip install -r requirements.txt

🚀 2. Reproducing Experiments

2.1. Single Experiment

Run evaluation with specific parameters:

python main.py --dataset MathOdyssey --model InternLM2-Math-Plus-7B --method RPC --K 128

Parameters:

  • --dataset: Choose from MATH, MathOdyssey, AIME, OlympiadBench
  • --model: Choose from Deepseek-Math-RL-7B, InternLM2-Math-Plus-1.8B, InternLM2-Math-Plus-7B
  • --method: Choose from PPL (Perplexity), SC (Self-Consistency), RPC (our method)
  • --K: Number of reasoning paths to sample (128 for MathOdyssey, AIME, OlympiadBench, and 64 for MATH)

2.2. Batch Experiments

Run comprehensive evaluation across multiple settings:

bash all_exps.sh

This will evaluate all method-dataset-model combinations and save results to results.txt.

2.3. Hints

  1. If you cannot download data from Hugging Face directly, please use Hugging Mirror instead.
  2. It may take some time to generate the cache for checking answer equality when running each dataset for the first time.

📚 3. BibTex

@inproceedings{zhou24theoretical,
      author    = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing},
      title     = {A Theorecial Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning},
      booktitle = {Advances in Neural Information Processing Systems},
      year      = {2025},
    }

Related Skills

View on GitHub
GitHub Stars15
CategoryDevelopment
Updated1mo ago
Forks3

Languages

Python

Security Score

90/100

Audited on Mar 7, 2026

No findings