SkillAgentSearch skills...

Agentops

The missing DevOps layer for coding agents. Flow, feedback, and memory that compounds between sessions.

Install / Use

/learn @boshu2/Agentops
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Claude Desktop
Cursor
OpenAI Codex

README

<div align="center">

AgentOps

Validate Nightly

Every session starts where the last one left off.

Validation, memory, lifecycle gates, and briefing-first control for coding agents. AgentOps acts as a software-factory control plane: bounded context goes in, validated code and durable learning come out.

Start Here · Install · See It Work · Skills · CLI · FAQ · Newcomer Guide

</div> <p align="center"> <img src="docs/assets/swarm-6-rpi.png" alt="Agents running full development cycles in parallel with validation gates and a coordinating team leader" width="800"> </p>

What AgentOps Gives You

Session 1, your agent spends 2 hours debugging a timeout bug. Session 15, a new agent finds the answer in 10 seconds — because /retro captured the lesson and the flywheel promoted it. Three capabilities make this work:

  1. Judgment validation — agents get risk context that challenges the plan and the code before shipping.
  2. Durable learning — solved problems stay solved. Your repo accumulates institutional knowledge across sessions, agents, and runtimes.
  3. Loop closure — completed work produces better next work, stronger rules, and richer future context.

Every skill, hook, and CLI command exists to deliver one of these three. They form a single lifecycle contract, not separate features.

Operationally, that means AgentOps behaves like a software factory:

  • briefings and startup context prepare the work order
  • RPI runs the delivery line
  • validation gates accept or reject output
  • the flywheel turns completed work into future advantage

See Software Factory Surface for the explicit operator lane.

| Capability | What you get | |-----|---------------| | Judgment validation | /pre-mortem challenges your plan before build; /vibe + /council validate code before commit | | Durable learning | Repo-native memory via .agents/ — lessons compound across sessions, agents, and runtimes | | Loop closure | Every cycle produces artifacts, issues, and next-work suggestions the next session acts on |


Install

# Claude Code (recommended): marketplace + plugin install
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

# Codex CLI (0.110.0+ native plugin)
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

# Other Skills-compatible agents (agent-specific, install only what you need)
# Example (Cursor):
npx skills@latest add boshu2/agentops --cursor -g

On Linux, also install system bubblewrap so Codex uses it directly:

sudo apt-get install -y bubblewrap

Install ao CLI (optional)

Skills work standalone. The ao CLI unlocks the full repo-native layer: knowledge extraction, retrieval and injection, maturity scoring, goals, and control-plane workflows.

Homebrew (recommended)

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops
brew install agentops
which ao
ao version

Or install via release binaries or build from source.

Then type /quickstart in your agent chat.

In Claude Code, CLAUDE.md is the startup surface. Installed hooks stay silent: SessionStart prepares runtime state and can stage factory goal or briefing files, while UserPromptSubmit can capture first-prompt intake without injecting additional context into the session.


Start Here

Three commands, zero methodology. Pick one and go:

/council validate this PR          # Multi-model code review — immediate value
/research "how does auth work"     # Codebase exploration with memory
/implement "fix the login bug"     # Full lifecycle for one task

When you're ready for more:

/plan → /crank                     # Decompose into issues, parallel-execute
/rpi "add retry backoff"           # Full pipeline: research → plan → build → validate → learn
/evolve                            # Fitness-scored improvement loop — walk away, come back to better code

Every skill works alone. Compose them however you want. Full catalog: Skills.

If you want the explicit operator lane instead of individual primitives:

ao factory start --goal "fix auth startup"
/rpi "fix auth startup"           # or: ao rpi phased "fix auth startup"
ao codex stop

That path keeps briefing, runtime startup, delivery, and loop closure on one surface. See Software Factory Surface.


How It Works

Each phase delivers one or more of the three capabilities — judgment, learning, loop closure:

| Phase | Primary skills | What you get | |------|----------------|---------------------| | Discovery | /brainstorm -> /research -> /plan -> /pre-mortem | Repo context, scoped work, known risks, execution packet | | Implementation | /crank -> /swarm -> /implement | Closed issues, validated wave outputs, ratchet checkpoints | | Validation + learning | /validation -> /vibe -> /post-mortem -> /retro -> /forge | Findings, learnings, next work, stronger prevention artifacts |

/rpi orchestrates all three phases. /evolve keeps running /rpi against GOALS.md so the worst fitness gap gets addressed next. The output is code + state + memory + gates.

The explicit CLI operator surface around that line is:

  • ao factory start for briefing-first startup
  • /rpi or ao rpi phased for delivery
  • ao codex stop for explicit loop closure

| Pattern | Chain | When | |---------|-------|------| | Quick fix | /implement | One issue, clear scope | | Validated fix | /implement/vibe | One issue, want confidence | | Planned epic | /plan/pre-mortem/crank/post-mortem | Multi-issue, structured | | Full pipeline | /rpi (chains all above) | End-to-end, autonomous | | Evolve loop | /evolve (chains /rpi repeatedly) | Fitness-scored improvement | | PR contribution | /pr-research/pr-plan/pr-implement/pr-validate/pr-prep | External repo | | Knowledge query | ao search/research (if gaps) | Understanding before building | | Standalone review | /council validate <target> | Ad-hoc multi-judge review |

Primitive chains underneath it

  • Mission and fitness: GOALS.md, ao goals, /evolve
  • Discovery chain: /brainstorm -> ao search / ao lookup -> /research -> /plan -> /pre-mortem
  • Execution chain: /crank -> /swarm -> /implement -> /vibe -> ratchet checkpoints
  • Compiled prevention chain: findings registry -> planning rules / pre-mortem checks / constraints -> later planning and validation
  • Continuity chain: session hooks + phased manifests + /handoff + /recover

Each cycle adds new rules, learnings, and constraints — without anyone shipping new code. See Primitive Chains for the audited map.

How Agent Memory Works

Session 50 starts with 50 sessions of accumulated wisdom.

.agents/ is a directory in your repo that stores what your agents learned — as plain files. Grep replaces RAG. Plain text you can diff, review in PRs, and open in Obsidian.

┌──────────────────────────────────────────────────────────────────────────┐
│   Traditional Cache          .agents/ Knowledge Store                    │
│  ┌────────────────────┐    ┌──────────────────────────────────────────┐  │
│  │ Stores results     │    │ Stores extracted lessons                 │  │
│  │ Hit = skip compute │    │ Hit = skip the 2-hour debugging          │  │
│  │ Flat key-value     │    │ Hierarchical: learning → pattern → rule  │  │
│  │ Static after write │    │ Promotes through tiers over time         │  │
│  │ One consumer       │    │ Any agent, any runtime, any session      │  │
│  └────────────────────┘    └──────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────┘

How it compounds: Session 1, your agent hits a timeout bug and spends 2 hours debugging. /retro captures the lesson. /athena promotes it to a pattern. Session 15, a new agent greps "timeout" and finds the answer in 2 operations — turning a 2-hour investigation into a 10-second lookup. Session 20, a planning rule gates plans that omit timeout checks. That's institutional knowledge that survives agent death.

┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
│  1. WORK   │─>│  2. FORGE  │─>│  3. POOL   │─>│ 4. PROMOTE │
│  Session   │  │  Extract   │  │  Score &   │  │  Graduate  │
└────────────┘  └────────────┘  └────────────┘  └────────────┘
     ^                                                │
     │         ┌────────────┐  ┌────────────┐         │
     └─────────│  6. INJECT │<─│5. LEARNINGS│<────────┘
               │  Surface   │  │  Permanent │
               └────────────┘  └────────────┘
> /research "retry backoff strategies"

[lookup] 3 prior learnings found (freshness-weighted):
  - Token bucket with Redis (established, high confidence)
  - Rate limit at middleware layer, not per-handler (pattern)
  - /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + retrieved context
           Recommends: exponential backoff with jitter, reuse existing Redis client

Stale insights decay automatically. Useful ones compound.

Related Skills

View on GitHub
GitHub Stars237
CategoryDevelopment
Updated43m ago
Forks22

Languages

Go

Security Score

85/100

Audited on Apr 2, 2026

No findings