Clawboss
Govern your AI agents, Build skills to avoid arbitrary code execution, and let your agents work while you sleep easy.
Install / Use
/learn @arunvenkatadri/ClawbossREADME
Clawboss
<img width="1536" height="1024" alt="image" src="https://github.com/user-attachments/assets/162a4e5e-ccdd-4152-b7e4-4addd2ef5743" />Stop your AI agents from going rogue and set your long acting agents up for success Clawboss wraps tool calls with timeouts, budgets, circuit breakers, and audit logging so one bad tool call doesn't drain your wallet or loop forever.
Zero dependencies. Works with any agent framework — LangChain, CrewAI, AutoGen, OpenClaw, your own custom loop, whatever. Just wrap your tool calls. Includes durable sessions that survive restarts, a REST control plane, and a dashboard for managing everything in one place.
Why
You deploy an agent. It calls a flaky API in a loop. 47 times. At $0.03 per call. At 3am. Nobody's watching.
Or: your agent decides to "keep researching" and burns through your entire token budget in one conversation. Or: a tool hangs for 90 seconds and your user stares at a spinner.
Clawboss is the guardrail layer between your agent and its tools. Every tool call goes through supervision — timeouts, budgets, circuit breakers — so you can deploy agents without white-knuckling it.
<p align="center"> <img src="docs/architecture.svg" alt="Clawboss architecture — agent → supervision layer → tools" width="900"> </p>No arbitrary code downloads
Most agent platforms want you to install skills from a community marketplace — arbitrary code that runs unsandboxed in your agent's process. One bad plugin and your agent has full access to your filesystem, credentials, and network.
Clawboss takes a different approach. You define skills and agents declaratively — what tools are available, what parameters they accept, what supervision limits apply. No downloading stranger code. No hoping someone reviewed that community plugin before you installed it. You control exactly what your agents can do, and every tool call goes through supervision whether you built it or someone else did.
Install
pip install clawboss
Quick start
import asyncio
from clawboss import Supervisor, Policy
# Define limits
policy = Policy(
max_iterations=5, # max tool call rounds
tool_timeout=15.0, # seconds per tool call
token_budget=10_000, # total token cap
)
supervisor = Supervisor(policy)
# Your tool function (any async callable)
async def web_search(query: str) -> str:
# ... your implementation ...
return f"Results for: {query}"
async def main():
# Supervise a tool call
result = await supervisor.call("web_search", web_search, query="python async")
if result.succeeded:
print(result.output)
else:
print(f"Failed: {result.error.user_message()}")
# Track token usage from your LLM calls
supervisor.record_tokens(1500)
# Finish and get final stats
snapshot = supervisor.finish()
print(f"Used {snapshot.tokens_used} tokens in {snapshot.iterations} iterations")
asyncio.run(main())
Dashboard
Open dashboard.html in a browser for a full management UI:
- Agents — create, edit, delete, pause/resume/stop agents with supervision policies
- Skills — define reusable capabilities (tool collections) and assign them to agents
- Sessions — live view of running agent sessions from the REST API, with pause/resume/stop controls, budget usage, and audit logs
- Chat — open a conversation with any agent directly from the dashboard
- Costs — track spend, set budgets with hard stops, view usage over time
- Policies — see all active supervision rules at a glance
The Sessions tab connects to the REST control plane (uvicorn clawboss.server:app) and shows real-time session data. Agent cards show live status and controls work against the real API.
What it does
| Feature | What it prevents | |---------|-----------------| | Tool timeout | A single tool call hanging forever | | Token budget | Runaway LLM costs blowing through your budget | | Iteration limit | Agent loops that never converge | | Circuit breaker | Hammering a tool that keeps failing | | Dead man's switch | Agent going silent (no activity for N seconds) | | Confirmation gates | Dangerous tools running without human approval | | Audit log | Not knowing what your agent did | | Privacy shielding | PII leaking through tool calls to LLMs or APIs | | Observability | No visibility into tool latency, error rates, cost | | Context compression | Agent forgetting its instructions mid-conversation | | Tool scoping | Agents calling tools with dangerous arguments | | Durable sessions | Agent dies mid-task, loses all progress | | Crash loop protection | Agent keeps crashing and restarting forever | | REST control plane | No way to pause/resume/stop agents remotely |
Works with any agent framework
Clawboss doesn't care what framework you use. It supervises tool calls — any async or sync callable. If your agent calls tools, Clawboss can wrap them.
# LangChain? Wrap your tools.
# CrewAI? Wrap your tools.
# AutoGen? Wrap your tools.
# Custom loop? Wrap your tools.
# OpenClaw? There's a built-in bridge (see below).
result = await supervisor.call("my_tool", my_tool_fn, **kwargs)
Tool scoping
Scopes are policies that validate tool arguments before execution. Instead of just "can this agent call write_file?" you control "can this agent call write_file with this path?"
policy = Policy.from_dict({
"tool_scopes": [
{
"tool_name": "write_file",
"rules": [
{"param": "path", "constraint": "allow", "values": ["/tmp/*", "/home/user/output/*"]},
],
},
{
"tool_name": "send_email",
"rules": [
{"param": "recipient", "constraint": "allow", "values": ["*@mycompany.com"]},
],
},
{
"tool_name": "web_search",
"rules": [
{"param": "query", "constraint": "block", "values": ["internal", "confidential"]},
],
"max_calls_per_minute": 10,
},
],
})
Scopes are just another type of policy. Assign them to agents the same way — a scope-only policy has zero supervision fields and one or more tool scope rules. Stack them with budget and rate-limit policies on the same agent.
Constraint types:
- allow — parameter must match at least one pattern (glob with
*and?) - block — parameter must NOT match any pattern
- match — parameter must match at least one regex
Context compression
Long-running agents drift. They forget their original instructions, blow past constraints, and hallucinate prior context. Clawboss solves this with supervision-anchored compression — a novel approach that only works because you have a supervision layer.
The key insight: supervised agents can compress more aggressively than unsupervised ones. Safety-critical state (policies, budgets, circuit breakers) is enforced by the supervisor, not by the LLM's memory. So you never need to keep that in context — it's reconstructed fresh every turn.
from clawboss import Supervisor, Policy
from clawboss.context import ContextWindow
supervisor = Supervisor(Policy(max_iterations=10, token_budget=10000))
ctx = ContextWindow(supervisor, max_recent_turns=10, skill_name="research")
# Add turns as the conversation progresses
ctx.add_turn("user", "Search for quantum computing breakthroughs")
ctx.add_turn("assistant", "Searching...", tool_calls=[...])
# Get the full context for your LLM prompt
prompt = ctx.to_prompt()
# When context gets long, compress older turns
result = await ctx.compress()
prompt = result.to_prompt()
The context has three zones:
| Zone | What it contains | Fidelity | |------|-----------------|----------| | Anchored state | Budget, circuit breakers, policies, confirmed tools | Always fresh from supervisor | | Compressed history | Older turns summarized by tool calls and snippets | Lossy but safe | | Recent turns | Last N turns | Full fidelity |
The anchored state is never compressed — it's rebuilt from the supervisor's live state every turn. Even if the LLM "forgets" its budget limit, the supervisor still enforces it. Bring your own LLM summarizer for richer compression, or use the built-in audit-based extraction.
Durable sessions
Long-running agents survive process restarts. Clawboss checkpoints supervisor state (iterations, token usage, circuit breaker states) to a pluggable store after every operation.
from clawboss import SessionManager, SqliteStore
store = SqliteStore("sessions.db") # or MemoryStore() for testing
mgr = SessionManager(store)
# Start a session
session_id = mgr.start("my-agent", {
"max_iterations": 20,
"tool_timeout": 30,
"token_budget": 50000,
})
# Get the supervisor and use it in your agent loop
sv = mgr.get_supervisor(session_id)
result = await sv.call("web_search", search_fn, query="python async")
sv.record_tokens(1500)
# Pause — the supervisor raises AgentPaused on next call()
mgr.pause(session_id)
# Resume later (even after a crash / restart)
sv = mgr.resume(session_id) # budget, iterations, circuit breakers all restored
result = await sv.call("web_search", search_fn, query="continue research")
# Stop when done
mgr.sto
