OpenEncompass
A flexible framework for AI agents that separates workflow logic from search strategy. Based on https://arxiv.org/pdf/2512.03571 Also read https://asari.ai/blog/encompass
Install / Use
/learn @nitin966/OpenEncompassREADME
OpenEnCompass
OpenEnCompass is an implementation of Asari AI's Encompass paper, which is a Python-to-CPS (Continuation-Passing Style) compiler.
It transforms standard Python generator functions into resumable state machines. This allows the program state (local variables, instruction pointer) to be serialized, cloned, and restored.
By treating the program execution as a tree of states, we can apply search algorithms (Beam Search, MCTS) to guide LLM agents, rather than relying on a single greedy sample.
This work is implemented with AI assistance (Antigravity) and correctness enforced via tests, deterministic replays and benchmarks.
Status: 73% accuracy on GSM8K (n=100) using qwen2.5:32b with Beam Search (width=8).
Agent Complexity Comparison
OpenEnCompass significantly reduces the complexity of building sophisticated agents by abstracting away state management and search logic. The following table compares the lines of code (LOC) required for a manual state machine implementation vs. an OpenEnCompass implementation across different patterns:
| Pattern | Base Agent (LOC) | OpenEnCompass Agent (LOC) | Reduction | | :--- | :--- | :--- | :--- | | Code Translation (Java → Python) | 493 | 250 | 49.3% | | Hypothesis Search (ARC-style) | 421 | 199 | 52.7% | | Reflexion (Self-Correction) | 410 | 219 | 46.6% |
On average, OpenEnCompass reduces agent code by ~50% while maintaining identical functional performance.
Mechanism
Standard LLM agents execute linearly:
State_t -> LLM -> Action -> State_{t+1}.
If the LLM errs, the trajectory fails.
OpenEnCompass compiles the agent code with implicit yields for cleaner syntax:
@encompass.compile
def agent():
# Function calls automatically checkpoint execution.
# The searcher can fork execution here 8 times (width=8).
plan = branchpoint(options=["Plan A", "Plan B", ...])
# Actions are also implicitly yielding
result = my_action(plan)
# Record scores without explicit yield
record_score(10)
...
Into a state machine:
class AgentMachine:
def run(self, state):
if state.pc == 0:
return BranchPoint(...)
if state.pc == 1:
# Restore variables, continue execution
...
This enables O(1) relative branching: forking a process is just copying the state object, avoiding re-computation of the history.
Search Strategies
OpenEnCompass decouples the agent logic (the generator) from the search algorithm (the driver).
Beam Search
Maintains the k most promising execution traces at each step.
from encompass.search import BeamSearch
# Run the agent with beam width 8
strategy = BeamSearch(width=8)
results = await strategy.search(agent)
Monte Carlo Tree Search (MCTS) Uses UCT (Upper Confidence Bound applied to Trees) to balance exploration and exploitation.
from encompass.search import MCTS
# Run 100 simulations with exploration constant 1.4
strategy = MCTS(iterations=100, exploration=1.4)
results = await strategy.search(agent)
Adaptive Branching MCTS (AB-MCTS) Implements the TreeQuest algorithm (Inoue et al., 2025). Uses Thompson Sampling with Bayesian posteriors (Normal-Inverse-Gamma) to adaptively decide between exploring wider or deeper.
from encompass.search# Configure AB-MCTS with custom priors and sharing policy
config = ABMCTSConfig(
iterations=50,
score_type="gaussian",
belief_sharing="pooled", # "independent" or "pooled"
prior_mean=0.5,
prior_kappa=1.0
)tegy = ABMCTS(config=config)
results = await strategy.search(agent)
Best-First Search Prioritizes nodes based on a heuristic value function.
from encompass.search import BestFirstSearch
strategy = BestFirstSearch(max_nodes=1000)
results = await strategy.search(agent)
Implicit Yield Mechanism
OpenEnCompass features automatic implicit yields for cleaner, more intuitive code. Control signals (branchpoint, record_score), actions (via @action), and nested agents automatically checkpoint execution without explicit yield keywords.
Traditional (still supported):
@compile
def agent():
x = yield branchpoint("choice")
yield record_score(10)
result = yield my_action(x)
return result
Modern (implicit yields):
@compile
def agent():
x = branchpoint("choice")
record_score(10)
result = my_action(x)
return result
The compiler automatically detects ControlSignal returns and manages state transitions, making agent code cleaner and more readable while maintaining full checkpointing capabilities.
Installation
Requires Python 3.10+ and ollama for local inference.
# 1. Clone and install dependencies
git clone https://github.com/nitin966/OpenEnCompass.git
cd OpenEnCompass
pip install -r requirements.txt
# 2. Install Ollama (for local LLMs)
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# 3. Pull the recommended model
ollama pull qwen2.5:32b
Usage
Running Comparison Experiments
You can run the comparison experiments to see the difference between base agents and OpenEnCompass agents:
# 1. Code Translation Experiment
python examples/code_translation/run_experiment.py --model qwen2.5:32b
# 2. Hypothesis Search Experiment
python examples/hypothesis_search/run_experiment.py --model qwen2.5:32b
# 3. Reflexion Agent Experiment
python examples/reflexion/run_experiment.py --model qwen2.5:32b
Running Benchmarks
Reproduce the 100% accuracy result on GSM8K:
python run_benchmark.py --benchmark gsm8k --strategy beam --real-llm --model qwen2.5:32b --width 8
To run the Full GSM8K Test Set (1319 problems):
# Run all problems (warning: takes a long time)
python run_benchmark.py --benchmark gsm8k_full --strategy beam --real-llm --model qwen2.5:32b --width 8
# Run a subset (e.g., first 100)
python run_benchmark.py --benchmark gsm8k_full --strategy beam --real-llm --model qwen2.5:32b --width 8 --limit 100
Other Benchmarks (ARC, Reflexion, LiveCodeBench)
# Reflexion (Code Generation + Self-Correction)
python run_benchmark.py --benchmark reflexion --strategy beam --real-llm --model qwen2.5:32b --width 3
# ARC (Hypothesis Search)
python run_benchmark.py --benchmark arc --strategy beam --real-llm --model qwen2.5:32b --width 3
# LiveCodeBench (TreeQuest / AB-MCTS)
# Evaluates code generation on partial update problems using AB-MCTS
python benchmarks/ab_mcts_benchmark.py --model qwen2.5:32b --iterations 10
Deep Search Validation
Verify the O(1) state restoration and linear scaling up to depth 50+:
python validation/simple_deep_test.py
Unit Tests
Run the comprehensive test suite (88 tests covering compiler, search, caching, and advanced control flow):
python -m unittest discover tests
Compiler Capabilities
OpenEnCompass supports advanced Python features in agent code:
- Control Flow:
if/else,while,for,break,continue, nested loops. - Exceptions:
try/except,raise, exception propagation. - Yield Expressions:
x = yield branchpoint(...),if (yield ...):,return (yield ...). - Imports:
import moduleworks and persists across states. - State Management: Automatic serialization of local variables (including large objects).
Current Limitations:
- Closures (nonlocal variables) are not yet supported.
try/finallyandwithstatements are not yet supported.- Tuple unpacking assignment (
x, y = ...) is not yet supported.
Performance
- O(1) Resumption: Resuming a machine takes constant time regardless of history length.
- Heavy State: Efficiently handles large context objects (e.g., 1MB strings) with minimal overhead (~0.6ms per step), leveraging structural sharing where possible.
Results
GSM8K Benchmark
| Configuration | Accuracy | Tasks | Nodes Explored | Duration | |---------------|----------|-------|----------------|----------| | Beam Search (k=8), qwen2.5:32b | 73% | 100 | 870 | ~48 min |
Beam Search enables systematic exploration of reasoning paths, improving over single-sample greedy decoding by maintaining multiple candidate solutions at each step.
Project Structure
encompass/
├── core/ # CPS compiler implementation (AST transformation)
├── runtime/ # Execution engine and cost tracking
├── search/ # Search strategy implementations (Beam, MCTS)
├── encompass/llm/ # LLM adapter implementations (Ollama, OpenAI)
├── benchmarks/ # Evaluation framework and datasets
├── validation/ # Deep search validation tests
├── tests/ # Unit and integration tests
└── examples/ # Reference implementations
├── code_translation/ # Java to Python translation experiment
├── hypothesis_search/# ARC-style pattern discovery exper
