DevClaw — Development Plugin for OpenClaw

Turn any group chat into a dev team that ships.

DevClaw is a plugin for OpenClaw that turns your orchestrator agent into a development manager. It hires developers, assigns tasks, reviews code, and keeps the pipeline moving — across as many projects as you have group chats.

Prerequisites: OpenClaw must be installed and running.

openclaw plugins install @laurentenhoor/devclaw

Then start onboarding by chatting with your agent in any channel:

"Hey, can you help me set up DevClaw?"

What it looks like

You have two projects in two Telegram groups. You go to bed. You wake up:

── Group: "Dev - My Webapp" ──────────────────────────────

Agent:  "⚡ Sending DEV (medior) for #42: Add login page"
Agent:  "✅ DEV DONE #42 — Login page with OAuth. PR opened for review."
Agent:  "🔀 PR approved for #42 — auto-merged. Issue closed."
Agent:  "⚡ Sending DEV (junior) for #43: Fix button color on /settings"
Agent:  "✅ DEV DONE #43 — Updated to brand blue. PR opened for review."
Agent:  "🔄 PR changes requested for #43 — Back to DEV."
Agent:  "⚡ Sending DEV (junior) for #43: Fix button color on /settings"

  You:  "Create an issue for refactoring the profile page, pick it up."

Agent:  created #44 "Refactor user profile page" on GitHub — To Do
Agent:  "⚡ Sending DEV (medior) for #44: Refactor user profile page"

Agent:  "✅ DEV DONE #43 — Fixed dark-mode color. PR opened for review."
Agent:  "🔀 PR approved for #43 — auto-merged. Issue closed."

── Group: "Dev - My API" ─────────────────────────────────

Agent:  "🧠 Spawning DEV (senior) for #18: Migrate auth to OAuth2"
Agent:  "✅ DEV DONE #18 — OAuth2 provider with refresh tokens. PR opened for review."
Agent:  "🔀 PR approved for #18 — auto-merged. Issue closed."
Agent:  "⚡ Sending DEV (medior) for #19: Add rate limiting to /api/search"

Multiple issues shipped, a PR review round-trip automatically handled, and a second project's migration completed — all while you slept. When you dropped in mid-stream to create an issue, the scheduler kept going before, during, and after.

Why DevClaw

Autonomous multi-project development

Each project is fully isolated — own queue, workers, sessions, and state. Workers execute in parallel within each project, and multiple projects run simultaneously. A token-free scheduling engine drives it all autonomously:

Scheduling engine — work_heartbeat continuously scans queues, dispatches workers, and drives DEV → review → DEV feedback loops
Project isolation — parallel workers per project, parallel projects across the system
Role instructions — per-project, per-role prompts injected at dispatch time

Process enforcement

GitHub/GitLab issues are the single source of truth — not an internal database. Every tool call wraps the full operation into deterministic code with rollback on failure:

External task state — labels, transitions, and status queries go through your issue tracker
Atomic operations — label transition + state update + session dispatch + audit log in one call
Tool-based guardrails — 23 tools enforce the process; the agent provides intent, the plugin handles mechanics

~60-80% token savings

Three mechanisms compound to cut token usage dramatically versus running one large model with fresh context each time:

Tier selection — Haiku for typos, Sonnet for features, Opus for architecture (~30-50% on simple tasks)
Session reuse — workers accumulate codebase knowledge across tasks (~40-60% per task)
Token-free scheduling — work_heartbeat runs on pure CLI calls, zero LLM tokens for orchestration

The problem DevClaw solves

OpenClaw is a great multi-agent runtime. It handles sessions, tools, channels, gateway RPC — everything you need to run AI agents. But it's a general-purpose platform. It has no opinion about how software gets built.

Without DevClaw, your orchestrator agent has to figure out on its own how to:

Pick the right model for the task complexity
Create or reuse the right worker session
Transition issue labels in the right order
Track which worker is doing what across projects
Schedule QA after DEV completes, and re-schedule DEV after QA fails
Detect crashed workers and recover
Log everything for auditability

That's a lot of reasoning per task. LLMs do it imperfectly — they forget steps, corrupt state, pick the wrong model, lose session references. You end up babysitting the thing you built to avoid babysitting.

DevClaw moves all of that into deterministic plugin code. The agent says "pick up issue #42." The plugin handles the other 10 steps atomically. Every time, the same way, zero reasoning tokens spent on orchestration.

Meet your team

DevClaw doesn't think in model IDs. It thinks in people.

When a task comes in, you don't configure anthropic/claude-sonnet-4-5 — you assign a medior developer. The orchestrator evaluates task complexity and picks the right person for the job:

Developers

| Level | Assigns to | Model | | ---------- | ------------------------------------------------- | ------ | | Junior | Typos, CSS fixes, renames, single-file changes | Haiku | | Medior | Features, bug fixes, multi-file changes | Sonnet | | Senior | Architecture, migrations, system-wide refactoring | Opus |

Reviewers

| Level | Assigns to | Model | | ---------- | -------------------------------------------- | ------ | | Junior | Standard code review, PR inspection | Sonnet | | Senior | Thorough security review, complex edge cases | Opus |

Testers (optional — enable in workflow.yaml)

| Level | Assigns to | Model | | ---------- | ------------------------------- | ------ | | Junior | Quick smoke tests, basic checks | Haiku | | Medior | Standard test validation | Sonnet | | Senior | Thorough QA, complex edge cases | Opus |

Architects

| Level | Assigns to | Model | | ---------- | ------------------------------ | ------ | | Junior | Standard design investigation | Sonnet | | Senior | Complex architecture decisions | Opus |

A CSS typo gets the intern. A database migration gets the architect. You're not burning Opus tokens on a color change, and you're not sending Haiku to redesign your auth system.

Every mapping is configurable — swap in any model you want per level.

How a task moves through the pipeline

Every issue follows the same path, no exceptions. DevClaw enforces it:

Planning → To Do → Doing → To Review → PR approved → Done (auto-merge + close)
Planning → To Research → Researching → Planning (architect findings)

stateDiagram-v2
    [*] --> Planning
    Planning --> ToDo: Ready for development
    Planning --> ToResearch: Needs investigation

    ToResearch --> Researching: Architect picks up
    Researching --> Planning: Architect done (findings posted)
    Researching --> Refining: Architect blocked

    ToDo --> Doing: DEV picks up
    Doing --> ToReview: DEV done (opens PR)
    Doing --> Refining: DEV blocked
    Refining --> ToDo: Human decides

    ToReview --> Done: PR approved (auto-merge + close)
    ToReview --> ToImprove: Changes requested / merge conflict
    ToImprove --> Doing: Scheduler picks up DEV fix

    Done --> [*]

By default, PRs go through human review on GitHub/GitLab. The heartbeat polls for approvals and auto-merges. You can switch to agent review or enable an optional test phase in workflow.yaml.

These labels live on your actual GitHub/GitLab issues. Not in some internal database — in the tool you already use. Filter by Doing in GitHub to see what's in progress. Set up a webhook on Done to trigger deploys. The issue tracker is the source of truth.

What "atomic" means here

When you say "pick up #42 for DEV", the plugin does all of this in one operation:

Verifies the issue is in the right state
Picks the developer level (or uses what you specified)
Transitions the label (To Do → Doing)
Creates or reuses the right worker session
Dispatches the task with project-specific instructions
Updates internal state
Logs an audit entry

If step 4 fails, step 3 is rolled back. No half-states, no orphaned labels, no "the issue says Doing but nobody's working on it."

What happens behind the scenes

Workers report back themselves

When a developer finishes, they call work_finish directly — no orchestrator involved:

DEV "done" → label moves to To Review, PR goes through human review
DEV "blocked" → label moves back to To Do, task returns to queue
PR approved → heartbeat auto-merges, label moves to Done, issue closes
PR changes requested → label moves to To Improve, scheduler picks up DEV on next tick

With the optional test phase enabled, an additional QA cycle runs before closing:

**TESTER

Devclaw

Install / Use

README