CodeClaw

CodeClaw converts Claude Code and Codex sessions into privacy-safe training datasets with review gates, background sync, and MCP memory retrieval.

TL;DR

Run codeclaw setup once.
Run codeclaw export --no-push to produce a local reviewed dataset.
Run codeclaw confirm ... to pass review gates.
Run codeclaw export --publish-attestation "..." only after explicit approval.

Default UX

Running plain codeclaw opens the full-screen TUI.
Running codeclaw export ... keeps the scripted CLI flow.

Why CodeClaw

Turn day-to-day coding sessions into structured, reusable training data.
Keep privacy controls first-class with redaction and manual review gates.
Preserve historical problem-solving context through MCP-accessible session memory.

Core Capabilities

Multi-source ingestion:
- Claude Code and Codex session discovery and parsing.
- Experimental adapter routing for Cursor, Windsurf, Aider, Continue.dev, Antigravity, VS Code, Zed, and Xcode beta logs.
Privacy-aware export:
- Secret and PII redaction, username anonymization, and project-level exclusions.
- Layered privacy engine: regex baseline + optional ML NER (codeclaw[pii-ml]).
Controlled publishing workflow:
- Local export, user review attestations, confirm gate, then push.
- Immutable dataset version snapshots + dedupe index on publish.
Continuous mode:
- Background watch daemon for incremental sync.
Memory tooling:
- MCP server with search, project patterns, trajectory stats, session lookup, graph similarity retrieval, and index refresh.

Install

pip install codeclaw

Optional extras:

pip install "codeclaw[pii-ml]"    # Presidio + spaCy detection layer
pip install "codeclaw[mcp]"       # MCP server runtime
pip install "codeclaw[finetune]"  # Experimental local fine-tune scaffolding

From source:

git clone https://github.com/ychampion/codeclaw.git
cd codeclaw
pip install -e ".[dev]"

Check installed version:

codeclaw --version

Quick Start

# Guided onboarding (HF auth help, repo setup, project scope, MCP, watcher)
codeclaw setup

# Reset local setup for a clean re-test (config/state/MCP entry)
codeclaw reset --all --yes

# Verify environment and connected scope
codeclaw doctor
codeclaw projects --source both
codeclaw stats
codeclaw diff --format json
codeclaw config --encryption status

# Export locally first
codeclaw export --no-push

# Review and confirm
codeclaw confirm \
  --full-name "YOUR FULL NAME" \
  --attest-full-name "Asked for full name and scanned export." \
  --attest-sensitive "Reviewed for company/client/private identifiers." \
  --attest-manual-scan "Manually reviewed representative sessions."

# Publish only after explicit approval
codeclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."

# Optional one-command sharing flow
codeclaw share --publish --publish-attestation "User explicitly approved publishing to Hugging Face."

Commands

| Command | Description | |---------|-------------| | codeclaw status | Show current stage and next steps (JSON) | | codeclaw prep | Discover projects and auth state | | codeclaw setup | Guided onboarding (HF, dataset repo, projects, MCP, watcher) | | codeclaw doctor | Verify logs, HF auth, MCP registration, and runtime PATH/version diagnostics | | codeclaw stats | Show session, token, redaction, and export metrics | | codeclaw stats --skill | Include trajectory-based growth metrics | | codeclaw diff | Preview exactly what would be redacted before confirm | | codeclaw projects | Manage connected project scope | | codeclaw reset [--all|--config|--state|--mcp] | Reset local setup files for clean re-onboarding/re-testing | | codeclaw list | List projects with source, size, and exclusion state | | codeclaw config ... | Configure repo, sources, exclusions, and redactions | | codeclaw config --encryption on|off|status | Manage encryption-at-rest mode | | codeclaw export --no-push | Export locally for review | | codeclaw export --dry-run | Preview what would be exported/published without writing files | | codeclaw confirm ... | Run checks and unlock push gate | | codeclaw export --publish-attestation "..." | Push dataset after approval | | codeclaw share [--publish] | Fast export flow with optional publish + dataset card update | | codeclaw watch --start|--stop|--status|--now|--pause|--resume | Manage background sync daemon lifecycle | | codeclaw watch --logs [--follow] | View daemon logs with optional streaming | | codeclaw watch --monitor [--follow] | Live watch monitor (status + recent activity) | | codeclaw watch --switch-project "<name>" | Quickly scope watcher to one project | | codeclaw watch --set-projects "a,b" | Set connected project scope directly | | codeclaw console | Interactive slash-command terminal (/status, /logs, /scope, /run) | | codeclaw tui | Full-screen TUI with activity feed, slash commands, jobs, and plugins | | codeclaw serve | Start MCP server over stdio | | codeclaw install-mcp | Register MCP server in Claude config | | codeclaw finetune --experimental | Preview fine-tune scaffold for local experimentation | | codeclaw synthesize --project <name> | Generate CODECLAW.md from synced sessions | | codeclaw update-skill claude | Install/update local CodeClaw skill |

Additional source filters are available for adapter-backed ingestion:

cursor, windsurf, aider, continue, antigravity, vscode, zed, xcode-beta

Watch transparency examples:

codeclaw watch --status
codeclaw watch --monitor --follow
codeclaw watch --logs --follow
codeclaw watch --pause
codeclaw watch --resume
codeclaw watch --switch-project "codex:codeclaw"

Interactive console mode:

codeclaw console --source codex
# Then inside the prompt:
/status
/projects
/scope codex:codeclaw
/logs 80
/run export --no-push

Full-screen TUI mode:

codeclaw
# equivalent explicit command:
codeclaw tui --source both

If plain codeclaw prints export/confirm JSON instead of opening TUI, you are on an older build. Upgrade and verify:

pip install --upgrade codeclaw
codeclaw --version

Inside the TUI:

/help
/status
/source codex
/watch on
/logs 80
/projects
/scope codex:codeclaw
/export --dry-run
/jobs
/plugins list

Minimal local plugin example (./plugins/echo):

plugins/
  echo/
    plugin.json
    plugin.py

plugin.json:

{
  "name": "echo",
  "version": "0.1.0",
  "entrypoint": "plugin.py",
  "description": "Simple echo command"
}

plugin.py:

from codeclaw.tui.types import CommandResult


def register(ctx):
    def _echo(_app, args):
        return CommandResult(ok=True, message=" ".join(args) if args else "echo")

    ctx.register_command("echo", _echo, "Echo input text", usage="/echo <text>")

MCP Memory Server

Install optional MCP dependency:

pip install "codeclaw[mcp]"
codeclaw install-mcp

Available MCP tools:

search_past_solutions(query, max_results=5)
get_project_patterns(project=None)
get_trajectory_stats()
get_session(session_id)
find_similar_sessions(context, max_results=5)
refresh_index()

Privacy and Safety

CodeClaw is designed for private-by-default workflows:

path and username anonymization
secret and high-entropy token detection
custom redaction lists
manual confirmation and attestation gates before publish
encryption-at-rest support for local artifacts with keyring-backed key management

Automated redaction is not perfect. Always review local exports before publishing.

Package Distribution

Primary: PyPI (pip install codeclaw)
Release artifacts: wheel/sdist are attached to each GitHub tag release.
Optional: publish to a secondary PyPI-compatible index by setting repository secrets:
- SECONDARY_PYPI_REPOSITORY_URL
- SECONDARY_PYPI_USERNAME
- SECONDARY_PYPI_PASSWORD

README Sync Policy

README command docs are enforced in CI:

tests/test_docs_consistency.py validates command naming and branding markers.
A CLI help parity test ensures README command rows stay aligned with the real CLI surface.

If commands change, CI fails until README is updated.

Community

Contribution guide: CONTRIBUTING.md
Security policy: SECURITY.md
Support channels: SUPPORT.md
Code of conduct: CODE_OF_CONDUCT.md
Release process: RELEASE.md

License

MIT - see LICENSE.

Codeclaw

Install / Use

README