SkillAgentSearch skills...

AutoPrompt

🧬 Natural selection for prompts, code, and text β€” powered by LLMs. Feed it a seed + fitness criteria, get evolved output. One file, zero deps.

Install / Use

/learn @ranausmanai/AutoPrompt
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

🧬 AutoPrompt

Natural selection for prompts, code, and text β€” powered by LLMs.

Python 3.10+ License: MIT Zero Dependencies Lines of Code

Feed it a seed file and fitness criteria. It breeds better versions through intelligent mutation, scores them, and keeps the winners. Repeat until it plateaus or hits your target score.

Works on anything text-based β€” prompts, code, configs, copy, schemas β€” if an LLM can judge it, AutoPrompt can evolve it.

  GEN 0 (seed): 3.2/10 β€” generic and vague
  GEN 1/10 ↑·· 5.8/10 (+2.6) [42s] β€” added structure and constraints
  GEN 2/10 ·↑· 7.1/10 (+1.3) [38s] β€” defined tone and examples
  GEN 3/10 ↑·· 8.4/10 (+1.3) [45s] β€” added edge case handling
  GEN 4/10 Β·Β·Β· 8.4/10 (=) [41s]
  GEN 5/10 ·↑· 9.2/10 (+0.8) [39s] β€” refined voice constraints

  STOP: target score 9.0 reached (9.2)

πŸš€ Quick Start

Prerequisites

You need one of these CLI tools installed:

  • Claude Code β€” claude CLI
  • Codex β€” codex CLI
  • Ollama β€” run local models (Qwen, Llama, Mistral, etc.)

No API keys needed. No pip install. Just Python 3.10+ and an LLM.

Run it

git clone https://github.com/usmanmughalji/AutoPrompt.git
cd AutoPrompt

# evolve a prompt
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
  examples/prompt-optimizer/criteria.md \
  --target 9.0

# evolve code (with benchmark)
python3 autoprompt.py examples/code-optimizer/seed.py \
  examples/code-optimizer/criteria.md \
  -b "python3 examples/code-optimizer/bench.py {file}"

That's it. Output lands in seed_evolved.txt (or seed_evolved.py).

🏠 Run with local models (Ollama)

# use qwen3.5 (default: 9b)
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
  examples/prompt-optimizer/criteria.md \
  -e ollama --target 9.0

# pick a specific model
python3 autoprompt.py examples/prompt-optimizer/seed.txt \
  examples/prompt-optimizer/criteria.md \
  -e ollama -m qwen3.5:27b

# works with any ollama model
python3 autoprompt.py seed.txt criteria.md -e ollama -m llama3.2:3b
python3 autoprompt.py seed.txt criteria.md -e ollama -m qwen2.5-coder:14b

Fully offline. No API keys. No tokens. Just your GPU.


🎯 How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                     β”‚
β”‚   seed file ──► mutate (LLM) ──► N variants         β”‚
β”‚                                      β”‚              β”‚
β”‚                               benchmark (optional)  β”‚
β”‚                                      β”‚              β”‚
β”‚                               judge (LLM) ──► scoresβ”‚
β”‚                                      β”‚              β”‚
β”‚                               keep best ──► repeat  β”‚
β”‚                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Seed β€” your starting file (prompt, code, whatever)
  2. Criteria β€” a markdown file describing what "better" means
  3. Mutate β€” the LLM generates N variations, each trying a different strategy
  4. Benchmark (optional) β€” run a script to test the mutation (for code)
  5. Judge β€” the LLM scores each mutation against your criteria (0-10)
  6. Select β€” keep the highest scorer, feed it back into step 3
  7. Stop β€” when target score is hit, patience runs out, or generations are done

The LLM learns from history β€” it sees what worked and what flopped in previous generations, so mutations get smarter over time.


πŸ“¦ What You Can Evolve

πŸ”€ Prompts

Optimize system prompts, few-shot examples, chain-of-thought templates.

python3 autoprompt.py my-prompt.txt criteria.md --target 9.0 --patience 3

πŸ’» Code

Evolve algorithms, functions, or scripts with optional benchmarks.

python3 autoprompt.py solver.py criteria.md -b "python3 bench.py {file}"

πŸ“ Copy & Content

Marketing copy, email templates, documentation β€” anything with quality criteria.

python3 autoprompt.py landing-page.md criteria.md -g 5

βš™οΈ Configs

YAML configs, SQL queries, regex patterns β€” if it's text and has a "better", evolve it.

python3 autoprompt.py config.yaml criteria.md -e codex

πŸ› οΈ Options

| Flag | Description | Default | |------|-------------|---------| | -g, --generations | Max generations to run | 10 | | -n, --population | Mutations per generation | 3 | | -b, --bench | Benchmark command ({file} = candidate path) | None | | -e, --engine | LLM backend: claude, codex, or ollama | claude | | -m, --model | Ollama model name (ignored for claude/codex) | qwen3.5:9b | | --target | Stop when score reaches this value | None | | --patience | Stop after N gens with no improvement | None | | --timeout | Stop after N seconds total | None | | --reasoning | Codex reasoning effort: low, medium, high | medium |

Smart stopping

AutoPrompt stops early when it makes sense:

# stop when good enough
python3 autoprompt.py seed.txt criteria.md --target 8.5

# stop when stuck
python3 autoprompt.py seed.txt criteria.md --patience 3

# stop after 5 minutes
python3 autoprompt.py seed.txt criteria.md --timeout 300

# combine them
python3 autoprompt.py seed.txt criteria.md --target 9.0 --patience 3 --timeout 600

πŸ“ Writing Criteria Files

The criteria file is a markdown file that tells the LLM what "better" means. This is the most important part β€” good criteria = good evolution.

Template

# Fitness Criteria: [What You're Evolving]

## Goal
One sentence describing the ideal output.

## Constraints
- Hard rules that must be followed
- Things that are NOT allowed
- Format requirements

## What "better" means (in priority order)
1. **Most important thing** β€” why it matters
2. **Second priority** β€” why it matters
3. **Third priority** β€” why it matters

## Scoring Guide
- 0-2: terrible (describe what this looks like)
- 3-4: below average
- 5-6: decent
- 7-8: good (describe what this looks like)
- 9-10: exceptional (describe what this looks like)

The scoring guide is key β€” it anchors the LLM's judgment so scores are consistent across generations.


πŸ“‚ Examples

examples/prompt-optimizer/

Evolves a generic blog post prompt into a production-quality system prompt. No benchmark needed β€” the LLM judges prompt quality directly.

python3 autoprompt.py examples/prompt-optimizer/seed.txt \
  examples/prompt-optimizer/criteria.md \
  --target 9.0 --patience 3

examples/code-optimizer/

Evolves a bubble sort into a fast hybrid sorting algorithm. Uses bench.py to verify correctness and measure speed.

python3 autoprompt.py examples/code-optimizer/seed.py \
  examples/code-optimizer/criteria.md \
  -b "python3 examples/code-optimizer/bench.py {file}" \
  --target 8.0

🧠 Tips

  • Start with a bad seed β€” the worse the starting point, the more dramatic the improvement. Makes for better demos too.
  • Be specific in criteria β€” "write well" is useless. "Use active voice, keep sentences under 20 words, include one concrete example per paragraph" is useful.
  • Use benchmarks for code β€” LLM-as-judge works for subjective quality, but for code you want deterministic correctness checks.
  • Set patience β€” --patience 3 prevents wasting tokens when the LLM has plateaued.
  • More population = more exploration β€” -n 5 tries more strategies per generation but costs more tokens.
  • Check the history β€” the LLM learns from previous generations. If it keeps trying the same thing, your criteria might be ambiguous.

πŸ—οΈ Architecture

AutoPrompt/
β”œβ”€β”€ autoprompt.py          # the entire engine (~300 lines)
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ prompt-optimizer/  # evolve prompts
β”‚   β”‚   β”œβ”€β”€ seed.txt       # starting prompt
β”‚   β”‚   └── criteria.md    # what makes a good prompt
β”‚   └── code-optimizer/    # evolve code
β”‚       β”œβ”€β”€ seed.py        # starting code (bubble sort)
β”‚       β”œβ”€β”€ criteria.md    # what makes good sorting code
β”‚       └── bench.py       # correctness + speed benchmark
β”œβ”€β”€ LICENSE
└── README.md

One file. Zero dependencies. Stdlib only.


🀝 Contributing

Found a bug? Have a cool criteria file? PRs welcome.


πŸ“„ License

MIT β€” do whatever you want with it.

Related Skills

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated11d ago
Forks1

Languages

Python

Security Score

90/100

Audited on Mar 15, 2026

No findings