Harness

Harness is an AI Agent development guardrail Meta-Skill that establishes four layers of defense for any project in one command: knowledge management, architecture constraints, feedback loops, and entropy management.

Generate Convert Improve

Install / Use

/learn @xwtro0tk1t-cloud/Harness

About this skill

Quality Score

0/100

README

Harness — AI Agent Development Guardrail System

Harness is an AI Agent development guardrail Meta-Skill that establishes four layers of defense for any project in one command: knowledge management, architecture constraints, feedback loops, and entropy management.

Optimized for Claude Code: Harness leverages Claude Code's unique Hook system (SessionStart / PreToolUse / PostToolUse / Stop) for system-level behavior enforcement — access controls the AI cannot bypass, not just "please follow the rules." Combined with the experimental Agent Teams feature, you can spin up multi-role collaboration (Architect / Engineer / Tester) in one prompt. All three enforcement layers (Hooks + instruction file + Skill psychological defense) are fully active on Claude Code, delivering the most complete guardrail experience.

Compatible with 9 AI coding tools: Cursor, Windsurf, Cline, GitHub Copilot, Aider, Continue, Devin, and any tool supporting project-level instruction files (via AGENT.md as generic fallback). Layer 2 (instruction file rules) and Layer 3 (docs/ documentation) work universally across all tools, ensuring core guardrails remain effective regardless of your IDE.

Why Do You Need Harness?

AI Agents write code fast, but "fast" brings four core problems:

| Problem | Symptom | Consequence | |---------|---------|-------------| | Knowledge gaps | Every new session starts from scratch with no project context | Repeated mistakes, violated conventions | | No constraints | Bad code exists in the codebase, AI copies and produces more bad code | Security vulnerabilities, architecture decay | | No feedback | "Confidently declares mission accomplished" when it's actually a mess | Production incidents, rework | | Entropy increase | Writing fast = garbage piles up fast | Technical debt explosion, outdated documentation |

Harness's solution: Establish four layers of guardrails with a single command at project initialization, automatically effective in every subsequent development session.

But it's not just these 4 — we've identified 24 pain points across 7 categories. Here's how Harness addresses each one ↓

Roadmap: AI Development Pain Points & Harness Solutions

24 common pain points in AI-assisted development, organized by category. Each lists Harness's current solution, strength rating, and planned future enhancements. Detailed version with full problem descriptions and solution architecture →

Thinking & Planning

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 1 | AI codes before thinking — jumps to implementation without understanding requirements | superpowers brainstorming HARD-GATE: no code until design approved | ★★★★★ | — | | 2 | Plans collapse mid-task — AI forgets the plan halfway through | planning-with-files 4 Hooks: re-read task_plan.md before every tool call | ★★★★☆ | Auto-detect plan drift (compare actions vs plan) | | 3 | One-shot answers — AI gives a single solution without exploring alternatives | superpowers brainstorming forces 2+ approaches with trade-offs | ★★★★☆ | — | | 4 | No adversarial review — nobody challenges the AI's design | Challenger (C) agent role: CLAIM/CHALLENGE/VERIFICATION/VERDICT | ★★★☆☆ | Auto-invoke Challenger after Architect produces a plan |

Memory & Context

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 5 | Context loss after /compact — AI forgets decisions and progress | Compact checkpoint rule: must update progress.md + task_plan.md before compact | ★★★★☆ | Auto-checkpoint hook before compact | | 6 | New session cold start — AI doesn't know the project | CLAUDE.md (≤150 lines) auto-loaded + docs/ B-tree index for on-demand deep reading | ★★★★★ | — | | 7 | Repeated mistakes across sessions — same pitfall hit multiple times | claudeception extracts pitfalls into reusable Skills; docs/pitfalls/ records | ★★★★☆ | Auto-match pitfall Skills before coding starts | | 8 | Context window quality degradation — output quality drops as context grows | Context Recovery 4-step protocol + Token Budget rules (offset+limit, structured output) | ★★★☆☆ | Token pressure monitoring + auto-compact suggestion |

Quality Control

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 9 | "Done" without verification — AI claims completion without evidence | superpowers verification Iron Law + Stop hook checks Phase completion | ★★★★★ | — | | 10 | No tests — code ships without test coverage | superpowers TDD Iron Law: "NO CODE WITHOUT FAILING TEST FIRST" | ★★★★★ | — | | 11 | Skips code review — AI produces code nobody reviews | superpowers code-review dispatches reviewer subagent | ★★★★☆ | Auto-trigger review on PR creation | | 12 | Security vulnerabilities introduced — AI writes insecure code | 3-layer security: CWE defense in CLAUDE.md + secure-coding.md + security-review Skills | ★★★★☆ | Auto-security-scan on every commit (Enterprise hook) |

Code Hygiene & Documentation

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 13 | Dead code accumulation — commented-out code, unused imports pile up | CLAUDE.md 5 MUST NOT hygiene rules + quality gate Check #5 | ★★★★☆ | Lint integration in quality gate | | 14 | Documentation goes stale — docs don't match code after changes | Three-tier doc sync: Lite (self-check) → Standard (dynamic grep) → Full (quality gate) | ★★★★☆ | PostToolUse hook for real-time doc sync reminder | | 15 | Root directory pollution — test scripts, debug files accumulate | CLAUDE.md rule: temp files go in tests/, harness-audit checks root cleanliness | ★★★☆☆ | Auto-move detected temp files | | 16 | FIXME/HACK debt — temporary fixes become permanent | CLAUDE.md rule: resolve within 1 week; harness-audit flags stale FIXMEs | ★★★☆☆ | Track FIXME age in quality gate |

Hallucination & Reliability

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 17 | API hallucination — AI invents non-existent APIs or library functions | Challenger role verifies claims against source/docs; superpowers Red Flags table | ★★★☆☆ | Auto-verify imports against installed packages | | 18 | Confident but wrong — AI states incorrect facts with high confidence | Challenger VERDICT system (CONFIRMED/REFUTED/UNVERIFIED) + evidence requirement | ★★★☆☆ | Mandatory citation for architectural claims | | 19 | Blind copy-paste — AI copies existing bad patterns in the codebase | CLAUDE.md MUST NOT rules + security standards block known bad patterns | ★★★☆☆ | Anti-pattern database from pitfall records |

Collaboration & Workflow

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 20 | No role separation — same AI does design, coding, testing, review | Agent Team: Architect / Challenger / Engineer / Tester with strict constraints | ★★★★☆ | Workflow orchestration (auto role transitions) | | 21 | Experience not captured — hard-won knowledge lost after session ends | claudeception continuous learning + UserPromptSubmit hook evaluation | ★★★★☆ | Auto-extract on session end (not just /claudeception) | | 22 | No project health visibility — don't know if guardrails are working | harness-audit: scan completeness of CLAUDE.md/docs/hooks/skills, output score | ★★★☆☆ | Trend tracking across audits |

Security & Compliance

| # | Pain Point | Current Solution | Strength | Future Enhancement | |---|-----------|-----------------|----------|-------------------| | 23 | Secret leaks in commits — API keys, credentials committed to git | Enterprise Hook: pre-commit secret scan + CLAUDE.md MUST NOT .env/.key/.pem | ★★★★☆ | Default-on secret scanning (not Enterprise-only) | | 24 | Supply chain attacks — malicious dependencies slip in | supply-chain-audit Skill (8 languages) + sca-ai-denoise for vulnerability triage | ★★★★☆ | Auto-audit on dependency changes |

Overall: 4 fully solved (★★★★★), 12 strong (★★★★☆), 8 partial (★★★☆☆), 0 unsolved. ★★★★★ = system-level enforcement, ★★★★☆ = strong with minor gaps, ★★★☆☆ = partial, enhancement planned.

Four-Layer Guardrail Architecture

┌─────────────────────────────────────────────────────────┐
│                  Harness Guardrail System                │
├──────────────┬──────────────┬──────────────┬────────────┤
│  Guardrail 1 │  Guardrail 2 │  Guardrail 3 │ Guardrail 4│
│  Knowledge   │  Architecture│  Feedback    │  Entropy   │
│  Mgmt 📋    │  Constraints 🚧│  Loops 🔄  │  Mgmt 🧹  │
│              │              │              │            │
│  CLAUDE.md   │  Hook-based  │  TDD         │  Code      │
│  docs/ tree  │  enforcement │  Code Review │  hygiene   │
│  Agent Team  │  Security    │  Verification│  Doc sync  │
│  Skill       │  standards   │  gates       │  Pitfall   │
│  ecosystem   │  CWE defense │  Security    │  records   │
│              │  Behavior    │  review      │  Knowledge │
│              │  red lines   │              │  extraction│
└──────────────┴──────────────┴──────────────┴────────────┘

Guardrail 1: Knowledge Management 📋

Problem: The AI Agent doesn't know your project's background, conventions, or habits.

Harness's solution:

1. CLAUDE.md — The AI's Onboarding Manual

After automatically analyzing the project, Harness generates a lean CLAUDE.md (≤150 lines) that serves as the AI's first reading material at the start of every session:

# MyProject
One-line description

## Documentation

Related Skills

node-connect

352.9k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.9k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.9k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。