Ecp
ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol
Install / Use
/learn @evaluation-context-protocol/EcpREADME
Evaluation Context Protocol (ECP)
Work in progress: this repository is actively evolving, and some concepts may change.
A lightweight protocol and reference runtime for evaluating agents with public output, private reasoning, and tool usage. This repo contains:
sdk/- Python SDK for implementing an ECP agent.runtime/- Python runtime (CLI) that runs manifests and grades results.examples/- Minimal examples (LangChain demo).spec/- Protocol specification.
Documentation
- Docs site: https://evaluation-context-protocol.github.io/ecp/
- Quickstart: https://evaluation-context-protocol.github.io/ecp/quickstart/
- Specification: https://evaluation-context-protocol.github.io/ecp/spec/
Quick Start
Create a venv and install from PyPI:
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install ecp-runtime "ecp-sdk[langchain]" langchain-openai
Run the example manifest:
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml
Generate an HTML report:
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --report .\report.html
Print a JSON report (useful for CI tooling):
python -m ecp_runtime.cli run --manifest .\examples\langchain_demo\manifest.yaml --json
If your manifest uses llm_judge, set your key:
$env:OPENAI_API_KEY="your_key_here"
Example (LangChain Agent + Manifest)
Agent (LangChain create_agent + tool usage):
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from ecp import serve
from ecp.adaptors.langchain import ECPLangChainAdapter
@tool
def calculator(expression: str) -> str:
allowed = set("0123456789+-*/() ")
if not expression or any(ch not in allowed for ch in expression):
return "Invalid expression."
try:
return str(int(eval(expression, {"__builtins__": {}})))
except Exception:
return "Invalid expression."
agent = create_agent(
model=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),
tools=[calculator],
system_prompt="Use the calculator tool for arithmetic."
)
def to_messages(text: str):
return {"messages": [{"role": "user", "content": text}]}
serve(ECPLangChainAdapter(agent, name="MathBot", input_mapper=to_messages))
Manifest (runtime checks output + tool usage):
manifest_version: "v1"
name: "LangChain Math Check"
target: "python agent.py"
scenarios:
- name: "Ratio Word Problem"
steps:
- input: "Katy makes coffee using teaspoons of sugar and cups of water in the ratio of 7:13..."
graders:
- type: text_match
field: public_output
condition: contains
value: "42"
- type: tool_usage
tool_name: "calculator"
arguments: {}
ECP in 60 Seconds
ECP is JSON-RPC 2.0 over stdio. The runtime launches your agent process and calls:
agent/initializeagent/stepagent/reset
Your agent replies with a structured result containing:
public_output(what the user sees)private_thought(for evaluators)tool_calls(actions taken)
See spec/protocol.md for the full protocol.
Repo Layout
sdk/python/src/ecp- SDK decorators and server loopruntime/python/src/ecp_runtime- CLI, runner, gradersexamples/langchain_demo- LangChain-based demo agent and manifest
Status
This project is evolving quickly. Expect changes between minor versions.
Related Skills
tmux
351.2kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
diffs
351.2kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
terraform-provider-genesyscloud
Terraform Provider Genesyscloud
blogwatcher
351.2kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
