Autocontext
a recursive self-improving harness designed to help your agents (and future iterations of those agents) succeed on any task
Install / Use
/learn @greyhaven-ai/AutocontextQuality Score
Category
Development & EngineeringSupported Platforms
README
autocontext is a closed-loop control plane for improving agent behavior over repeated runs.
It executes tasks, evaluates outcomes, updates persistent knowledge, and can distill successful behavior into cheaper local runtimes. The goal is to move from frontier-model exploration toward validated, reusable, lower-cost execution.
Why It Exists
Most agent systems start every run cold. They do not reliably carry forward what worked, what failed, and what should change next.
autocontext adds that missing feedback loop:
- run the task
- analyze what happened
- persist validated lessons
- use those lessons in the next run
- optionally train and route to local models when the task is stable enough
How It Works
Each generation runs through a structured multi-agent loop:
competitorproposes a strategy or artifact for the taskanalystexplains what happened and whycoachturns that analysis into playbook updates and future hintsarchitectproposes tools, harness improvements, or structural changescuratorgates what knowledge is allowed to persist
Strategies are then evaluated through scenario execution, staged validation, and gating. Weak changes are rolled back. Successful changes accumulate into reusable knowledge.
Choose An Entry Point
- Want the full control plane, API server, scenario runner, and training loop? Start with the Python package in
autocontext/. - Want a lighter Node/TypeScript toolkit for judging outputs, running improvement loops, queueing work, or exposing MCP tools? Start with
ts/. - Want to wire another agent into autocontext? Start with the CLI-first guide in
autocontext/docs/agent-integration.md. - Want to contribute or point a coding agent at the repo? Read
CONTRIBUTING.mdandAGENTS.md.
What's New
- GEPA-inspired ASI/Pareto optimizer wired into improvement loop
- Component sensitivity profiling and credit assignment
- Pluggable scoring backends with Elo and Glicko support
- Novelty exploration and multi-basin playbook branching
- Cost-aware loop control and long-run presets
Core Capabilities
- Persistent playbooks, hints, tools, reports, and progress snapshots across runs
- Staged validation, harness synthesis, and harness-aware execution
- Scenario families for simulation, investigation, workflow, coordination, negotiation, artifact editing, operator-in-the-loop, tool-fragility, and schema-evolution tasks
- Frontier-to-local distillation with MLX on Apple Silicon
- Runtime routing across Anthropic, OpenAI-compatible backends, Ollama, vLLM, MLX, and Pi-based runtimes
- OpenClaw-facing APIs and agent integration surfaces
- CLI, API server, and TypeScript terminal UI surfaces for operators and external agents
Quick Start From Source
The Python application lives in autocontext/, and most uv, pytest, ruff, and mypy commands should be run from there.
cd autocontext
uv venv
source .venv/bin/activate
uv sync --group dev
AUTOCONTEXT_AGENT_PROVIDER=deterministic uv run autoctx run \
--scenario grid_ctf \
--gens 3 \
--run-id quickstart
That creates a local run, writes artifacts under runs/ and knowledge/, and works without external API keys.
Run with Anthropic:
cd autocontext
AUTOCONTEXT_AGENT_PROVIDER=anthropic \
AUTOCONTEXT_ANTHROPIC_API_KEY=your-key \
uv run autoctx run --scenario grid_ctf --gens 3
Start the API server:
cd autocontext
uv run autoctx serve --host 127.0.0.1 --port 8000
Then inspect http://127.0.0.1:8000/ for the API index, or use npx autoctx tui for the interactive terminal UI.
Use the repo-level .env.example as the reference for available AUTOCONTEXT_* settings.
Installable Packages
The repo publishes two installable packages with different scopes:
- Python package:
pip install autoctx - TypeScript package:
npm install autoctx
Important: the npm package for this project is autoctx.
autocontext on npm is a different package.
The Python package exposes the full autoctx control-plane CLI (run, serve, mcp-serve, train, new-scenario, export, wait, and more). The TypeScript package exposes a narrower autoctx CLI focused on evaluation, improvement loops, queueing, and MCP serving for Node runtimes.
Which Package Should You Use?
| If you want to... | Start here | Why |
|---|---|---|
| Run the full multi-generation control plane | autocontext/README.md | Python has the API server, training loop, scenario scaffolding, export/import, and full CLI surface. |
| Embed judging or improvement loops in a Node app | ts/README.md | The TypeScript package is smaller and focused on judge-based workflows, queueing, and MCP serving. |
| Point an external agent at autocontext | autocontext/docs/agent-integration.md | It documents the CLI-first contract, JSON output, MCP usage, and SDK options. |
| Grab copy-paste integration snippets | examples/README.md | The examples cover Python CLI, Claude Code MCP, Python SDK, and TypeScript library usage. |
| Catch up on recent repo evolution | CHANGELOG.md | It summarizes the v0.2.0 release and current unreleased work. |
Common Workflows
- Run the generation loop:
uv run autoctx run --scenario grid_ctf --gens 3 - Inspect runs:
uv run autoctx list,uv run autoctx status <run_id> - Scaffold a custom scenario:
uv run autoctx new-scenario --template prompt-optimization --name my-task - Export training data:
uv run autoctx export-training-data --scenario grid_ctf --all-runs --output training/grid_ctf.jsonl - Train a local model:
uv run autoctx train --scenario grid_ctf --data training/grid_ctf.jsonl --time-budget 300 - Start the API server:
uv run autoctx serve --host 127.0.0.1 --port 8000 - Start the MCP server:
uv run autoctx mcp-serve - Wait on a monitor condition:
uv run autoctx wait <condition_id> --json
operator-in-the-loop remains a typed scenario family for capability discovery and experimentation, but autocontext does not scaffold executable operator-loop runtimes. Use datasets, tools, or live-agent experiments instead of harness-owned escalation scripts.
MLX training is host-only on Apple Silicon macOS. If you want a sandboxed OpenClaw agent to trigger training, use the file-based host watcher flow documented in autocontext/docs/mlx-training.md.
Repository Layout
autocontext/: Python package, CLI, API server, and training loopts/: published TypeScript package, CLI, and MCP-compatible toolingdocs/: docs landing page and maintainer checklistsexamples/: copy-paste integration snippets for package users and external agentsinfra/: Docker, Fly.io, and bootstrap scriptsprotocol/: shared protocol artifactsscripts/: repo maintenance and generation scripts
Where To Look Next
- Canonical vocabulary and object model: docs/concept-model.md
- Docs overview: docs/README.md
- Analytics and adoption: docs/analytics.md
- Python package guide: autocontext/README.md
- TypeScript package guide: ts/README.md
- Copy-paste examples: examples/README.md
- External agent integration: autocontext/docs/agent-integration.md
- Recent changes: CHANGELOG.md
- Contributor setup: CONTRIBUTING.md
- Repo agent guide: AGENTS.md
- MLX host training and OpenClaw bridge: autocontext/docs/mlx-training.md
- Sandbox and executor notes: autocontext/docs/sandbox.md
- License: LICENSE
Project Signals
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
AGENTS
339.5kExtensions Boundary This directory contains bundled plugins. Treat it as the same boundary that third-party plugins see. Public Contracts - Docs: - `docs/plugins/building-plugins.md` - `do
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
