SkillAgentSearch skills...

Braindump

Extract coding rules from PR review comments and generate AGENTS.md files for any GitHub repository.

Install / Use

/learn @pydantic/Braindump
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Braindump

Extract coding rules from PR review comments and generate AGENTS.md files for any GitHub repository.

[!NOTE] This project is 100% vibecoded, and the pipeline and thresholds have been optimized for the pydantic-ai repo primarily.

If the pipeline generates unexpected rules for your repo, whether invalid or duplicate or otherwise unhelpful, you're encouraged to tell Claude (or your coding agent of choice) to investigate the issue (by referring to the generated rule IDs) and make changes to the pipeline until it does what you want.

It's expected that some teams will use their own fork of braindump that evolves over time to meet their needs: there's no expectation that the version in this repo will work for absolutely everyone, so you don't need to upstream changes unless you believe they are strictly better for every user than what came before.

How it works

GitHub PR reviews → download → extract → synthesize → dedupe → place → group → generate → AGENTS.md
  1. Download — fetch PR data (reviews, comments, diffs) via gh CLI
  2. Extract — use Claude to identify actionable changes and generalizable rules from each comment (bot comments are filtered out automatically)
  3. Synthesize — embed generalizations, cluster by similarity, extract validated rules. Each rule is scored based on the LLM's confidence multiplied by a factor for how many unique PRs the rule's evidence spans (1 PR = 0.6×, 2 = 0.85×, 3 = 0.95×, 4+ = 1.0×), so rules that came up across more reviews score higher.
  4. Dedupe — three-pass deduplication (embedding clusters + category review + post-consolidation review)
  5. Place — determine where each rule belongs (root, directory, file, cross-file), filtering by min_score floor (default 0.5)
  6. Group — filter by min_score threshold (default 0.5) and organize by topic for progressive disclosure
  7. Generate — rephrase rules and write final AGENTS.md files

Each stage is resumable — if interrupted, it picks up from where it left off. Pass --fresh to any stage or run to wipe previous outputs and start clean.

Example

As described in the "Fighting Fire With Fire: How We're Scaling Open Source Code Review at Pydantic With AI" blog post, we used braindump to turn the 4,668 PR review comments @DouweM made on pydantic/pydantic-ai between October 2025 and February 2026 into 149 rules at a cost of just over $60:

$ uv run braindump --repo pydantic/pydantic-ai run --since 2025-10-01 --authors DouweM --max-rules=150

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃Stage         ┃ Status    ┃ Details                                ┃ Updated   ┃       Cost┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│download      │ done      │ 883 PRs | 10,020 review comments, 883  │ 11d ago   │           │
│              │           │ diffs                                  │           │           │
│extract       │ done      │ 4,668/10,020 comments → 3,851          │ 10d ago   │     $40.17│
│              │           │ actionable, 817 rejected → 5,320       │           │           │
│              │           │ generalizations                        │           │           │
│synthesize    │ done      │ 5,320 generalizations → 3,004 in 1,054 │ 10d ago   │     $14.22│
│              │           │ clusters, 2,316 unclustered → 1,238    │           │           │
│              │           │ rules                                  │           │           │
│              │           │               (similarity ≥ 0.65, min  │           │           │
│              │           │ cluster size 2, coherence: 0.87)       │           │           │
│dedupe        │ done      │ 1,238 → 1,014 rules (224 merged)       │ 4m ago    │      $5.12│
│place         │ done      │ 1,014 → 197 rules placed (score ≥ 0.8) │ 1m ago    │      $2.14│
│              │           │ | agents_md_root: 106, agents_md_dir:  │           │           │
│              │           │ 85, cross_file: 5, file: 1             │           │           │
│group         │ done      │ 150/197 rules (score ≥ 0.8) → 6        │ 0m ago    │      $0.09│
│              │           │ locations | 109 inline, 40 in topics   │           │           │
│generate      │ done      │ 6 AGENTS.md files, 3 topic docs (46    │ 0m ago    │      $0.83│
│              │           │ KB) | root, docs, pydantic_ai_slim,    │           │           │
│              │           │ pydantic_ai_slim/pydantic_ai,          │           │           │
│              │           │ pydantic_ai_slim/pydantic_ai/models,   │           │           │
│              │           │ tests                                  │           │           │
└──────────────┴───────────┴────────────────────────────────────────┴───────────┴───────────┘

Total cost: $62.57

Pipeline complete!

Generated files:
  data/pydantic/pydantic-ai/7-generate/AGENTS.md
  data/pydantic/pydantic-ai/7-generate/agent_docs/api-design.md
  data/pydantic/pydantic-ai/7-generate/agent_docs/code-simplification.md
  data/pydantic/pydantic-ai/7-generate/agent_docs/documentation.md
  data/pydantic/pydantic-ai/7-generate/docs/AGENTS.md
  data/pydantic/pydantic-ai/7-generate/pydantic_ai_slim/AGENTS.md
  data/pydantic/pydantic-ai/7-generate/pydantic_ai_slim/pydantic_ai/AGENTS.md
  data/pydantic/pydantic-ai/7-generate/pydantic_ai_slim/pydantic_ai/models/AGENTS.md
  data/pydantic/pydantic-ai/7-generate/tests/AGENTS.md

Prerequisites

Setup

The braindump CLI is not currently published on PyPI, so the first step is to clone this repo locally. Then run:

# Install dependencies
uv sync

# Add your Pydantic AI Gateway token to .env
echo "PYDANTIC_AI_GATEWAY_API_KEY=your-token" > .env

# Authenticate GitHub CLI (if not already)
gh auth login

Quick start

Run the full pipeline:

uv run braindump --repo pydantic/pydantic-ai run --since 2025-10-01

This will include review comments by all non-bot authors; use --authors to limit this.

This will write all rules to AGENTS.md that have a score of at least 0.5, which may end up being too many depending on how many source comments you have. To limit the output to the best rules, you can use the --min-score option. To determine an appropriate value that balances not missing important rules with not overloading the agent's context window, you can run the full pipeline, then use the group --preview command to show a table of rule counts and marginal examples at different score thresholds, and then run again from the group stage using run --from group --fresh --min-score=<score>.

Commands

All stages support --concurrency N to control parallel LLM/API requests and --fresh to wipe previous outputs before running.

run — Full pipeline

uv run braindump --repo owner/repo run [--from STAGE] [--skip STAGE ...] [--since DATE] [--authors USER] [--min-score 0.5] [--max-rules N] [--fresh]
  • --from: Start from a specific stage (e.g. --from synthesize)
  • --skip: Skip stages (repeatable, e.g. --skip download --skip extract)
  • --since: Date filter for download (YYYY-MM-DD)
  • --authors: Author filter for extract (default: all)
  • --min-score: Override rule score threshold (default: 0.5)
  • --max-rules N: Cap the number of rules in group stage (top-scored)
  • --fresh: Wipe all stage outputs and start from scratch

download — Fetch PR data

uv run braindump --repo owner/repo download [--since YYYY-MM-DD] [--concurrency 5]

extract — Extract rules from comments

uv run braindump --repo owner/repo extract [--authors USER] [--limit N] [--random] [--prs 1,2,3] [--concurrency 10]
  • --prs: Filter to specific PR numbers (comma-separated)
  • --limit N: Limit number of comments to process
  • --random: Randomly sample comments (with --seed for reproducibility)

synthesize — Cluster and validate rules

uv run braindump --repo owner/repo synthesize [--similarity-threshold 0.65] [--min-cluster-size 3] [--concurrency 10]

dedupe — Deduplicate similar rules

uv run braindump --repo owner/repo dedupe [--similarity-threshold 0.75] [--concurrency 10]

place — Determine rule placement

uv run braindump --repo owner/repo place [--min-score 0.5] [--concurrency 10]

group — Organize by topic

uv run braindump --repo owner/repo group [--min-score 0.5] [--max-rules N] [--preview]
  • --min-score: Minimum rule score to include (default: 0.5)
  • --max-rules N: Cap the number of rules included (top-scored). Applied after --min-score filtering.
  • --preview: Show a table of rule counts and marginal examples at different score thresholds, then exit without running the LLM grouping. Useful for picking an appropriate --min-score or --max-rules.

Re-run from group with a different threshold to adjust how many rules end up in the output — no need to re-run place:

uv run braindump --repo owner/repo run --from group --min-score 0.6
uv run braindump --repo owner/repo run --from group --max-rules 80

generate — Write AGENTS.md files

uv run braindump --repo owner/repo generate [--dry-run] [--concurrency 10]

status — Pipeline status

uv run braindump --repo owner/repo status

Shows what data exists per stage, key metrics (including cost), and sugge

Related Skills

View on GitHub
GitHub Stars39
CategoryDevelopment
Updated15h ago
Forks4

Languages

Python

Security Score

90/100

Audited on Mar 31, 2026

No findings