AIWG

Multi-agent AI framework for Claude Code, Copilot, Cursor, Warp, and 4 more platforms

188 agents, 50 CLI commands, 128 skills, 6 frameworks, 21 addons. SDLC workflows, digital forensics, research management, marketing operations, media curation, and ops infrastructure — all deployable with one command.

npm i -g aiwg        # install globally
aiwg use sdlc        # deploy SDLC framework

Get Started · Features · Agents · CLI Reference · Documentation · Community

</div>

What AIWG Is

AIWG is a cognitive architecture that gives AI coding assistants structured memory, multi-agent ensemble validation, and closed-loop self-correction. It deploys specialized agents, workflow commands, enforcement rules, and artifact templates to any of 8 AI platforms with a single CLI command.

If you have used AI coding assistants and thought "this is amazing for small tasks but falls apart on anything complex," AIWG is the missing infrastructure layer that scales AI assistance to multi-week projects.

Unlike prompt libraries or ad-hoc workflows, AIWG implements research-backed patterns from cognitive science (Miller 1956, Sweller 1988), multi-agent systems (Jacobs et al. 1991, MetaGPT, AutoGen), and software engineering (Cooper's stage-gate, FAIR Principles, W3C PROV). The system addresses the hard problems in AI-augmented development: recovering from failures, maintaining context across sessions, preventing hallucinated citations, and ensuring reproducible workflows.

What Problems Does AIWG Solve?

Base AI assistants (Claude, GPT-4, Copilot without frameworks) have three fundamental limitations:

1. No Memory Across Sessions

Each conversation starts fresh. The assistant has no idea what happened yesterday, what requirements you documented, or what decisions you made last week. You re-explain context every morning.

Without AIWG: Projects stall as context rebuilding eats time. A three-month project requires continuity, not fresh starts every session.

With AIWG: The .aiwg/ directory maintains 50-100+ interconnected artifacts across days, weeks, and months. Later phases build on earlier ones automatically because memory persists. Agents read prior work via @-mentions instead of regenerating from scratch.

The segmented structure also makes large projects tractable. As code files grow, the project doesn't become harder to reason about — agents load only the slice of memory relevant to the current task (@requirements/UC-001.md, @architecture/sad.md, @testing/test-plan.md) rather than the entire codebase. Each subdirectory is a focused knowledge domain that fits comfortably in context, while cross-references keep everything connected.

The artifact index (aiwg index) takes this further. Without any tooling, agents often need to browse 3-6 documents before finding what they need. AIWG's structured artifacts reduce this to 2-3. With the index enabled, agents resolve artifact lookups in one query more often than not — a direct hit on the right requirement, architecture decision, or test case without browsing.

2. No Recovery Patterns

When AI generates broken code or flawed designs, you manually intervene, explain the problem, and hope the next attempt works. There is no systematic learning from failures, no structured retry, no checkpoint-and-resume.

Without AIWG: Research shows 47% of AI workflows produce inconsistent outputs without reproducibility constraints (R-LAM, Sureshkumar et al. 2026). Debugging is trial-and-error.

With AIWG: The agent loop implements closed-loop self-correction — execute, verify, learn from failure, adapt strategy, retry. External Ralph survives crashes and runs for 6-8+ hours autonomously. Debug memory accumulates failure patterns so the agent doesn't repeat mistakes.

3. No Quality Gates

Base assistants optimize for "sounds plausible" not "actually works." A general assistant critiques security, performance, and maintainability simultaneously — poorly. No domain specialization, no multi-perspective review, no human approval checkpoints.

Without AIWG: Production code ships without architectural review, security validation, or operational feasibility assessment.

With AIWG: 162 specialized agents provide domain expertise — Security Auditor reviews security, Test Architect reviews testability, Performance Engineer reviews scalability. Multi-agent review panels with synthesis. Human-in-the-loop gates at every phase transition. Research shows 84% cost reduction keeping humans on high-stakes decisions versus fully autonomous systems (Agent Laboratory, Schmidgall et al. 2025).

The Six Core Components

1. Memory — Structured Semantic Memory

The .aiwg/ directory is a persistent artifact repository storing requirements, architecture decisions, test strategies, risk registers, and deployment plans across sessions. This implements Retrieval-Augmented Generation patterns (Lewis et al., 2020) — agents retrieve from an evolving knowledge base rather than regenerating from scratch.

Each artifact is discoverable via @-mentions (e.g., @.aiwg/requirements/UC-001-login.md). Context sharing between agents happens through artifacts: the requirements analyst writes use cases, the architecture designer reads them.

2. Reasoning — Multi-Agent Deliberation with Synthesis

Instead of a single general-purpose assistant, AIWG provides 162 specialized agents organized by domain. Complex artifacts go through multi-agent review panels:

Architecture Document Creation:
  1. Architecture Designer drafts SAD
  2. Review Panel (3-5 agents run in parallel):
     - Security Auditor    → threat perspective
     - Performance Engineer → scalability perspective
     - Test Architect       → testability perspective
     - Technical Writer     → clarity and consistency
  3. Documentation Synthesizer merges all feedback
  4. Human approval gate → accept, iterate, or escalate

Research shows 17.9% accuracy improvement with multi-path review on complex tasks (Wang et al., GSM8K benchmarks, 2023). Agent specialization means security review is done by a security specialist, not a generalist.

3. Learning — Closed-Loop Self-Correction (Ralph)

Ralph executes tasks iteratively, learns from failures, and adapts strategy based on error patterns. Research from Roig (2025) shows recovery capability — not initial correctness — predicts agentic task success.

Ralph Iteration:
  1. Execute task with current strategy
  2. Verify results (tests pass, lint clean, types check)
  3. If failure: analyze root cause → extract structured learning → adapt strategy
  4. Log iteration state (checkpoint for resume)
  5. Repeat until success or escalate to human after 3 failed attempts

External Ralph adds crash resilience: PID file tracking, automatic restart, cross-session persistence. Tasks run for 6-8+ hours surviving terminal disconnects and system reboots.

4. Verification — Bidirectional Traceability

AIWG maintains links between documentation and code to ensure artifacts stay synchronized:

// src/auth/login.ts
/**
 * @implements @.aiwg/requirements/UC-001-login.md
 * @architecture @.aiwg/architecture/SAD.md#section-4.2
 * @tests @test/unit/auth/login.test.ts
 */
export function authenticateUser(credentials: Credentials): Promise<AuthResult> {

Verification types: Doc → Code, Code → Doc, Code → Tests, Citations → Sources. The retrieval-first citation architecture reduces citation hallucination from 56% to 0% (LitLLM benchmarks, ServiceNow 2025).

5. Planning — Phase Gates with Cognitive Load Management

AIWG structures work using Cooper's Stage-Gate methodology (1990), breaking multi-month projects into bounded phases with explicit quality criteria and human approval:

Inception → Elaboration → Construction → Transition → Production
   LOM          ABM            IOC            PR

Cognitive load optimization follows Miller's 7±2 limits (1956) and Sweller's worked examples approach (1988):

4 phases (not 12)
3-5 artifacts per phase (not 20)
5-7 section headings per template (not 15)
3-5 reviewers per panel (not 10)

6. Style — Controllable Voice Generation

Voice profiles provide continuous control over AI writing style using 12 parameters (formality, technical depth, sentence variety, jargon density, personal tone, humor, directness, examples ratio, uncertainty acknowledgment, opinion strength, transition style, authenticity markers).

Built-in voices: technical-authority (docs, RFCs), friendly-explainer (tutorials), executive-brief (summaries), casual-conversational (blogs, socia

Aiwg

Install / Use

README