AceCoder

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]

Generate Convert Improve

Install / Use

/learn @TIGER-AI-Lab/AceCoder

About this skill

Quality Score

0/100

README

🂡 AceCoder

<a target="_blank" href="https://arxiv.org/abs/2502.01718"> <img style="height:22pt" src="https://img.shields.io/badge/-Paper-red?style=flat&logo=arxiv"></a> <a target="_blank" href="https://github.com/TIGER-AI-Lab/AceCoder"> <img style="height:22pt" src="https://img.shields.io/badge/-Code-green?style=flat&logo=github"></a> <a target="_blank" href="https://tiger-ai-lab.github.io/AceCoder/"> <img style="height:22pt" src="https://img.shields.io/badge/-🌐%20Website-blue?style=flat"></a> <a target="_blank" href="https://huggingface.co/datasets/TIGER-Lab/AceCode-87K"> <img style="height:22pt" src="https://img.shields.io/badge/-🤗%20Dataset-red?style=flat"></a> <a target="_blank" href="https://huggingface.co/collections/TIGER-Lab/acecoder-67a16011a6c7d65cad529eba"> <img style="height:22pt" src="https://img.shields.io/badge/-🤗%20Models-red?style=flat"></a>  Authors: <a class="name" target="_blank" href="https://www.wyett-zeng.com/about.html">Huaye Zeng</a>, <a class="name" target="_blank" href="https://jdf-prog.github.io/">Dongfu Jiang</a>, <a class="name" target="_blank" href="#">HaoZhe Wang</a>, <a class="name" target="_blank" href="#">Ping Nie</a>, <a class="name" target="_blank" href="#">Xiaotong Chen</a>, <a class="name" target="_blank" href="https://wenhuchen.github.io/">Wenhu Chen</a>  @ <a class="btna" target="_blank" href="https://huggingface.co/TIGER-Lab">TIGER-Lab</a>

🔥News

[2025/2/3] We release the AceCoder Paper, along with the 🤗 Models and Datasets on Hugging Face.

Overview

./assets/images/ac_overview.png

<details><summary>Abstract</summary>

We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset AceCode-87K, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones.
We trained two reward model AceCodeRM-7B and AceCodeRM-32B on the constructed preference pairs. Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.
We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get 25% improvement on HumanEval-plus and 6% on MBPP-plus within just 80 optimization steps.
To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.

</details>

📚Dataset

AceCode-87K: The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini
AceCodePair-300K: Constructed preference pairs from AceCode-87K for training reward model.
AceCode-87K-hard: where you can create sample 25% of the hard examples following commands here

🤗Model

AceCodeRM (Reward Model)

AceCodeRM-7B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct
AceCodeRM-32B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct

AceCoder (RL Model)

📈 Performance

See our website or paper for detailed performance report.

🚀Quick Start

git submodule init
git submodule update

Use AceCodrRM

First install acecoder as a package:

pip install https://github.com/TIGER-AI-Lab/AceCoder.git

Then see examples/run_acecoderm.py for how to use AceCoderRM. Quick command python examples/run_acecoderm.py will run the example.

Training Reward Model

See train/train_rm/README.md for detailed instructions.

Training RL Model

See train/train_rl/README.md for detailed instructions.

Evaluation

We use Evalplus, bigcodebench, LiveCodeBench for evaluation of HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) respectively.

Citation

If you find this work helpful, please consider citing:

@article{AceCoder,
    title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
    author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
    journal={ArXiv},
    year={2025},
    volume={2502.01718}
}

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。