ACPBench
ACPBench: Reasoning about Action, Change, and Planning. A benchmark designed to evaluate the fundamental reasoning abilities in the domains of action, change, and planning. It spans seven atomic reasoning tasks--determining action applicability, action reachability, plan justification, determining landmark, predicting state transitions, state rea
Install / Use
/learn @IBM/ACPBenchREADME
ACPBench
<p align="center"> <a href="https://ibm.github.io/ACPBench">🏠 Homepage</a> • <a href="https://arxiv.org/abs/2410.05669">📄 Paper</a> • <a href="https://huggingface.co/datasets/ibm-research/acp_bench">🤗 Dataset</a> </p> <p align="center"> <a href="./GettingStarted.md">🔥 Getting Started</a> • <a href="#-citation">📜 Citation</a> • <a href="#-acknowledgement">🙏 Acknowledgement</a> </p>📰 News
- 📝 January 2026: ACPBench-Hard accepted at ICLR 2026
- 🎓 December 2025: ACPBench featured in NeurIPS 2025 Tutorial on Planning in the Era of Language Models
- 🎉 February 2025: ACPBench presented at AAAI 2025 in Philadelphia, PA
Overview
ACPBench is a benchmark designed to evaluate the reasoning capabilities of large language models (LLMs) across Action, Change, and Planning. It includes seven atomic reasoning tasks spanning thirteen domains, offered in two formats: boolean and multiple‑choice. ACPBench‑Hard extends this benchmark by introducing generative question formats and adding an eighth task focused on predicting the next action.
| Task | Abbreviation | Question Types | Description | |------|--------------|----------------|-------------| | Action Applicability | app | MCQ, Bool, Gen | Tests the ability of an agent to identify which actions are valid and executable in a given state or context. | | Progression | prog | MCQ, Bool, Gen | The ability of an agent to understand how the world state changes after performing an action | | Atom Reachability | reach | MCQ, Bool, Gen | The ability of an agent to determine whether a specific goal or state can be reached from the current state through a sequence of valid actions. | | Validation | val | MCQ, Bool, Gen | The ability of an agent to verify that an action sequence is executable and actually achieves the goal. | | Action Reachability | areach | MCQ, Bool, Gen | The ability of an agent to evaluate whether an action can ever become applicable along any valid future trajectory | | Action Justification | just | MCQ, Bool, Gen | The ability of an agent to detect an unjustified actions in a plan and simply the plan without losing validity or goal achievement | | Landmarks | land | MCQ, Bool, Gen | The ability of an agent to recognizes mandatory subgoals that every valid plan must pass through. | | Next Action | nexta | Gen | Choosing the right next step is what turns understanding into purposeful action |
1. Applicability (app), checks which actions are applicable in a state.
<details><summary > Examples</summary>Multiple choice questions (MCQ)
Example:
{
"id": -6575941946410689765,
"group": "applicable_actions_mc",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a ferry. Each location is accessible by ferry from each other location. The cars can be debarked or boarded, and the ferry can carry only one car at a time. There are 2 locations and 10 cars, numbered consecutively. Currently, the ferry is at l1, with the car c0 on board. The cars are at locations as follows: c4, c7, and c9 are at l1; c6, c3, c1, c5, c2, and c8 are at l0.",
"question": "Which of the following actions will be applicable in this state? A. unload the car c7 from the ferry to location l0. B. sail from location l1 to location l0. C. load the car c1 at location l0 on to the ferry. D. load the car c2 at location l0 on to the ferry.",
"choices": {
"text": [
"unload the car c7 from the ferry to location l0",
"sail from location l1 to location l0",
"load the car c1 at location l0 on to the ferry",
"load the car c2 at location l0 on to the ferry"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"query": "Which action will be applicable in this state?"
},
Yes-no/binary questions (Bool)
Example:
{
"id": -8342636639526456067,
"group": "applicable_actions_bool",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a ferry. Each location is accessible by ferry from each other location. The cars can be debarked or boarded, and the ferry can carry only one car at a time. There are 2 locations and 20 cars, numbered consecutively. Currently, the ferry is at l1 location and it is empty. The cars are at locations as follows: c7, c11, c2, c16, c14, c19, c5, c4, c12, c17, and c1 are at l1; c13, c8, c6, c18, c0, c3, c9, c10, and c15 are at l0.",
"question": "Is the following action applicable in this state: travel by sea from location l1 to location l0?"
},
</details>
2. Progression (prog), checks what would happens once an action is applied.
<details> <summary > Examples</summary>Multiple choice questions (MCQ)
Example:
{
"id": -6721318970102316394,
"group": "progression_mcq",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a ferry. Each location is accessible by ferry from each other location. The cars can be debarked or boarded, and the ferry can carry only one car at a time. There are 2 locations and 10 cars, numbered consecutively. Currently, the ferry is at l1, with the car c2 on board. The cars are at locations as follows: c0, c3, c6, c1, c8, and c9 are at l0; c7, c5, and c4 are at l1.",
"question": "Which of the following facts hold after performing the action \"sail from location l1 to location l0\" in the current state? A. The ferry is at l0 location and The ferry is at l1 location. B. The ferry is at l1 location and The ferry is empty. C. The ferry is empty. D. The ferry is at l0 location.",
"choices": {
"text": [
"The ferry is at l0 location and The ferry is at l1 location",
"The ferry is at l1 location and The ferry is empty",
"The ferry is empty",
"The ferry is at l0 location"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"query": "Which fact will hold after performing the action \"sail from location l1 to location l0\" in the current state?"
},
Yes-no/binary questions (Bool)
Example:
{
"id": -8215166616105943671,
"group": "progression_bool",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a ferry. Each location is accessible by ferry from each other location. The cars can be debarked or boarded, and the ferry can carry only one car at a time. There are 2 locations and 5 cars, numbered consecutively. Currently, the ferry is at l0 location and it is empty. The cars are at locations as follows: c1, c0, c3, and c2 are at l0; c4 is at l1.",
"question": "Will the fact \"Car c4 is on the ferry\" hold after performing the action \"sail from location l0 to location l1\" in the current state?"
},
</details>
3. Atom Reachability (reach), checks which atoms are reachable from a state.
<details> <summary > Examples</summary>Multiple choice questions (MCQ)
Example:
{
"id": 7931544803254567708,
"group": "reachable_atom_mc",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a ferry. Each location is accessible by ferry from each other location. The cars can be debarked or boarded, and the ferry can carry only one car at a time. There are 2 locations and 10 cars, numbered consecutively. Currently, the ferry is at l0, with the car c3 on board. The cars are at locations as follows: c0, c1, c2, c6, c8, and c9 are at l0; c4, c7, and c5 are at l1.",
"question": "Which of the following options can hold in a state that can potentially be reached? A. Ferry has car l1 on board. B. Car c8 is at location l0 and Car c8 is on board the ferry. C. The ferry is at c5 location and Car c5 is at location l1. D. The ferry is at l1 location and Car c3 is at location l1.",
"choices": {
"text": [
"Ferry has car l1 on board",
"Car c8 is at location l0 and Car c8 is on board the ferry",
"The ferry is at c5 location and Car c5 is at location l1",
"The ferry is at l1 location and Car c3 is at location l1"
],
"label": [
"A",
"B",
"C",
"D"
]
},
"query": "Which fact is reachable from this state?"
},
Yes-no/binary questions (Bool)
Example:
{
"id": -2426698749034015429,
"group": "reachable_atom_bool",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a ferry. Each location is accessible by ferry from each other location. The cars can be debarked or boarded, and the ferry can carry only one car at a time. There are 2 locations and 10 cars, numbered consecutively. Currently, the ferry is at l0 location and it is empty. The cars are at locations as follows: c2, c7, and c5 are at l1; c3, c4, c6, c9, c1, c0, and c8 are at l0.",
"question": "Is it possible to transition to a state where the following holds: Car c2 is at location c0?"
},
</details>
4. Validation (val), checks whether a sequence of actions is applicable and achieves the goal
<details> <summary > Examples</summary>Multiple choice questions (MCQ)
Example:
{
"id": -2425816914857415723,
"group": "validation_mcq",
"context": "This is a ferry domain, where the task is to transport cars from their start to their goal locations, using a fer
