AutoPrompt
𧬠Natural selection for prompts, code, and text β powered by LLMs. Feed it a seed + fitness criteria, get evolved output. One file, zero deps.
Install / Use
/learn @ranausmanai/AutoPromptREADME
𧬠AutoPrompt
Natural selection for prompts, code, and text β powered by LLMs.
Feed it a seed file and fitness criteria. It breeds better versions through intelligent mutation, scores them, and keeps the winners. Repeat until it plateaus or hits your target score.
Works on anything text-based β prompts, code, configs, copy, schemas β if an LLM can judge it, AutoPrompt can evolve it.
GEN 0 (seed): 3.2/10 β generic and vague
GEN 1/10 βΒ·Β· 5.8/10 (+2.6) [42s] β added structure and constraints
GEN 2/10 Β·βΒ· 7.1/10 (+1.3) [38s] β defined tone and examples
GEN 3/10 βΒ·Β· 8.4/10 (+1.3) [45s] β added edge case handling
GEN 4/10 Β·Β·Β· 8.4/10 (=) [41s]
GEN 5/10 Β·βΒ· 9.2/10 (+0.8) [39s] β refined voice constraints
STOP: target score 9.0 reached (9.2)
π Quick Start
Prerequisites
You need one of these CLI tools installed:
- Claude Code β
claudeCLI - Codex β
codexCLI - Ollama β run local models (Qwen, Llama, Mistral, etc.)
No API keys needed. No pip install. Just Python 3.10+ and an LLM.
Run it
git clone https://github.com/usmanmughalji/AutoPrompt.git
cd AutoPrompt
# evolve a prompt
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
examples/prompt-optimizer/criteria.md \
--target 9.0
# evolve code (with benchmark)
python3 autoprompt.py examples/code-optimizer/seed.py \
examples/code-optimizer/criteria.md \
-b "python3 examples/code-optimizer/bench.py {file}"
That's it. Output lands in seed_evolved.txt (or seed_evolved.py).
π Run with local models (Ollama)
# use qwen3.5 (default: 9b)
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
examples/prompt-optimizer/criteria.md \
-e ollama --target 9.0
# pick a specific model
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
examples/prompt-optimizer/criteria.md \
-e ollama -m qwen3.5:27b
# works with any ollama model
python3 autoprompt.py seed.txt criteria.md -e ollama -m llama3.2:3b
python3 autoprompt.py seed.txt criteria.md -e ollama -m qwen2.5-coder:14b
Fully offline. No API keys. No tokens. Just your GPU.
π― How It Works
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β seed file βββΊ mutate (LLM) βββΊ N variants β
β β β
β benchmark (optional) β
β β β
β judge (LLM) βββΊ scoresβ
β β β
β keep best βββΊ repeat β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Seed β your starting file (prompt, code, whatever)
- Criteria β a markdown file describing what "better" means
- Mutate β the LLM generates N variations, each trying a different strategy
- Benchmark (optional) β run a script to test the mutation (for code)
- Judge β the LLM scores each mutation against your criteria (0-10)
- Select β keep the highest scorer, feed it back into step 3
- Stop β when target score is hit, patience runs out, or generations are done
The LLM learns from history β it sees what worked and what flopped in previous generations, so mutations get smarter over time.
π¦ What You Can Evolve
π€ Prompts
Optimize system prompts, few-shot examples, chain-of-thought templates.
python3 autoprompt.py my-prompt.txt criteria.md --target 9.0 --patience 3
π» Code
Evolve algorithms, functions, or scripts with optional benchmarks.
python3 autoprompt.py solver.py criteria.md -b "python3 bench.py {file}"
π Copy & Content
Marketing copy, email templates, documentation β anything with quality criteria.
python3 autoprompt.py landing-page.md criteria.md -g 5
βοΈ Configs
YAML configs, SQL queries, regex patterns β if it's text and has a "better", evolve it.
python3 autoprompt.py config.yaml criteria.md -e codex
π οΈ Options
| Flag | Description | Default |
|------|-------------|---------|
| -g, --generations | Max generations to run | 10 |
| -n, --population | Mutations per generation | 3 |
| -b, --bench | Benchmark command ({file} = candidate path) | None |
| -e, --engine | LLM backend: claude, codex, or ollama | claude |
| -m, --model | Ollama model name (ignored for claude/codex) | qwen3.5:9b |
| --target | Stop when score reaches this value | None |
| --patience | Stop after N gens with no improvement | None |
| --timeout | Stop after N seconds total | None |
| --reasoning | Codex reasoning effort: low, medium, high | medium |
Smart stopping
AutoPrompt stops early when it makes sense:
# stop when good enough
python3 autoprompt.py seed.txt criteria.md --target 8.5
# stop when stuck
python3 autoprompt.py seed.txt criteria.md --patience 3
# stop after 5 minutes
python3 autoprompt.py seed.txt criteria.md --timeout 300
# combine them
python3 autoprompt.py seed.txt criteria.md --target 9.0 --patience 3 --timeout 600
π Writing Criteria Files
The criteria file is a markdown file that tells the LLM what "better" means. This is the most important part β good criteria = good evolution.
Template
# Fitness Criteria: [What You're Evolving]
## Goal
One sentence describing the ideal output.
## Constraints
- Hard rules that must be followed
- Things that are NOT allowed
- Format requirements
## What "better" means (in priority order)
1. **Most important thing** β why it matters
2. **Second priority** β why it matters
3. **Third priority** β why it matters
## Scoring Guide
- 0-2: terrible (describe what this looks like)
- 3-4: below average
- 5-6: decent
- 7-8: good (describe what this looks like)
- 9-10: exceptional (describe what this looks like)
The scoring guide is key β it anchors the LLM's judgment so scores are consistent across generations.
π Examples
examples/prompt-optimizer/
Evolves a generic blog post prompt into a production-quality system prompt. No benchmark needed β the LLM judges prompt quality directly.
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
examples/prompt-optimizer/criteria.md \
--target 9.0 --patience 3
examples/code-optimizer/
Evolves a bubble sort into a fast hybrid sorting algorithm. Uses bench.py to verify correctness and measure speed.
python3 autoprompt.py examples/code-optimizer/seed.py \
examples/code-optimizer/criteria.md \
-b "python3 examples/code-optimizer/bench.py {file}" \
--target 8.0
π§ Tips
- Start with a bad seed β the worse the starting point, the more dramatic the improvement. Makes for better demos too.
- Be specific in criteria β "write well" is useless. "Use active voice, keep sentences under 20 words, include one concrete example per paragraph" is useful.
- Use benchmarks for code β LLM-as-judge works for subjective quality, but for code you want deterministic correctness checks.
- Set patience β
--patience 3prevents wasting tokens when the LLM has plateaued. - More population = more exploration β
-n 5tries more strategies per generation but costs more tokens. - Check the history β the LLM learns from previous generations. If it keeps trying the same thing, your criteria might be ambiguous.
ποΈ Architecture
AutoPrompt/
βββ autoprompt.py # the entire engine (~300 lines)
βββ examples/
β βββ prompt-optimizer/ # evolve prompts
β β βββ seed.txt # starting prompt
β β βββ criteria.md # what makes a good prompt
β βββ code-optimizer/ # evolve code
β βββ seed.py # starting code (bubble sort)
β βββ criteria.md # what makes good sorting code
β βββ bench.py # correctness + speed benchmark
βββ LICENSE
βββ README.md
One file. Zero dependencies. Stdlib only.
π€ Contributing
Found a bug? Have a cool criteria file? PRs welcome.
π License
MIT β do whatever you want with it.
Related Skills
node-connect
337.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
83.2kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
337.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
