SkillAgentSearch skills...

Gepa

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Install / Use

/learn @gepa-ai/Gepa
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="https://raw.githubusercontent.com/gepa-ai/gepa/refs/heads/main/assets/gepa_logo_with_text.svg" alt="GEPA Logo" width="450"> </p> <p align="center"> <strong>Optimize any text parameter — prompts, code, agent architectures, configurations — using LLM-based reflection and Pareto-efficient evolutionary search.</strong> </p> <p align="center"> <a href="https://gepa-ai.github.io/gepa/"><strong>Website</strong></a> &ensp;|&ensp; <a href="https://gepa-ai.github.io/gepa/guides/quickstart/"><strong>Quick Start</strong></a> &ensp;|&ensp; <a href="https://arxiv.org/abs/2507.19457"><strong>Paper</strong></a> &ensp;|&ensp; <a href="https://gepa-ai.github.io/gepa/blog/"><strong>Blog</strong></a> &ensp;|&ensp; <a href="https://discord.gg/WXFSeVGdbW"><strong>Discord</strong></a> </p> <p align="center"> <a href="https://pypi.org/project/gepa/"><img src="https://img.shields.io/pypi/v/gepa?logo=python&logoColor=white&color=3776ab" alt="PyPI"></a> <a href="https://pepy.tech/projects/gepa"><img src="https://static.pepy.tech/badge/gepa" alt="Downloads"></a> <a href="https://github.com/gepa-ai/gepa"><img src="https://img.shields.io/github/stars/gepa-ai/gepa?style=flat&logo=github&color=181717" alt="GitHub stars"></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green?style=flat" alt="License"></a> </p> <p align="center"> <a href="https://join.slack.com/t/gepa-ai/shared_invite/zt-3o352xhyf-QZDfwmMpiQjsvoSYo7M1_w"><img src="https://badgen.net/badge/icon/Slack?icon=slack&label&color=4A154B" alt="Slack"></a> <a href="https://discord.gg/WXFSeVGdbW"><img src="https://dcbadge.limes.pink/api/server/https://discord.gg/WXFSeVGdbW?style=flat" alt="Discord"></a> </p>

What is GEPA?

GEPA (Genetic-Pareto) is a framework for optimizing any system with textual parameters against any evaluation metric. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes. Through iterative reflection, mutation, and Pareto-aware selection, GEPA evolves high-performing variants with minimal evaluations.

If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more.

Key Results

| | | |---|---| | 90x cheaper | Open-source models + GEPA beat Claude Opus 4.1 at Databricks | | 35x faster than RL | 100–500 evaluations vs. 5,000–25,000+ for GRPO (paper) | | 32% → 89% | ARC-AGI agent accuracy via architecture discovery | | 40.2% cost savings | Cloud scheduling policy discovered by GEPA, beating expert heuristics | | 55% → 82% | Coding agent resolve rate on Jinja via auto-learned skills | | 50+ production uses | Across Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, Comet ML, and more |

"Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world"Tobi Lutke, CEO, Shopify


Installation

pip install gepa

To install the latest from main:

pip install git+https://github.com/gepa-ai/gepa.git

Quick Start

Simple Prompt Optimization

Optimize a system prompt for math problems from the AIME benchmark in a few lines of code (full tutorial):

import gepa

trainset, valset, _ = gepa.examples.aime.init_dataset()

seed_prompt = {
    "system_prompt": "You are a helpful assistant. Answer the question. "
                     "Put your final answer in the format '### <answer>'"
}

result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="openai/gpt-4.1-mini",
    max_metric_calls=150,
    reflection_lm="openai/gpt-5",
)

print("Optimized prompt:", result.best_candidate['system_prompt'])

Result: GPT-4.1 Mini goes from 46.6% → 56.6% on AIME 2025 (+10 percentage points).

With DSPy (Recommended for AI Pipelines)

The most powerful way to use GEPA for prompt optimization is within DSPy, where it's available as dspy.GEPA. See dspy.GEPA tutorials for executable notebooks.

import dspy

optimizer = dspy.GEPA(
    metric=your_metric,
    max_metric_calls=150,
    reflection_lm="openai/gpt-5",
)
optimized_program = optimizer.compile(student=MyProgram(), trainset=trainset, valset=valset)

optimize_anything: Beyond Prompts

The optimize_anything API optimizes any text artifact — code, agent architectures, configurations, SVGs — not just prompts. You provide an evaluator; the system handles the search.

import gepa.optimize_anything as oa
from gepa.optimize_anything import optimize_anything, GEPAConfig, EngineConfig

def evaluate(candidate: str) -> float:
    result = run_my_system(candidate)
    oa.log(f"Output: {result.output}")      # Actionable Side Information
    oa.log(f"Error: {result.error}")         # feeds back into reflection
    return result.score

result = optimize_anything(
    seed_candidate="<your initial artifact>",
    evaluator=evaluate,
    objective="Describe what you want to optimize for.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=100)),
)

How It Works

Traditional optimizers know that a candidate failed but not why. GEPA takes a different approach:

  1. Select a candidate from the Pareto frontier (candidates excelling on different task subsets)
  2. Execute on a minibatch, capturing full execution traces
  3. Reflect — an LLM reads the traces (error messages, profiler output, reasoning logs) and diagnoses failures
  4. Mutate — generate an improved candidate informed by accumulated lessons from all ancestors
  5. Accept — add to the pool if improved, update the Pareto front

GEPA also supports system-aware merge — combining strengths of two Pareto-optimal candidates excelling on different tasks. The key concept is Actionable Side Information (ASI): diagnostic feedback returned by evaluators that serves as the text-optimization analogue of a gradient.

For details, see the paper and the documentation.


Adapters: Plug GEPA into Any System

GEPA connects to your system via the GEPAAdapter interface — implement evaluate and make_reflective_dataset, and GEPA handles the rest.

Built-in adapters:

| Adapter | Description | |---|---| | DefaultAdapter | System prompt optimization for single-turn LLM tasks | | DSPy Full Program | Evolves entire DSPy programs (signatures, modules, control flow). 67% → 93% on MATH. | | Generic RAG | Vector store-agnostic RAG optimization (ChromaDB, Weaviate, Qdrant, Pinecone) | | MCP Adapter | Optimize MCP tool descriptions and system prompts | | TerminalBench | Optimize the Terminus terminal-use agent | | AnyMaths | Mathematical problem-solving and reasoning tasks |

See the adapters guide for how to build your own, and DSPy's adapter as a reference.


Integrations

GEPA is integrated into several major frameworks:

  • DSPydspy.GEPA for optimizing DSPy programs. Tutorials.
  • MLflowmlflow.genai.optimize_prompts() for automatic prompt improvement.
  • Comet ML Opik — Core optimization algorithm in Opik Agent Optimizer.
  • Pydantic — Prompt optimization for Pydantic AI.
  • OpenAI Cookbook — Self-evolving agents with GEPA.
  • HuggingFace Cookbook — Prompt optimization guide.
  • Google ADK — Optimizing Google Agent Development Kit agents.

Example Optimized Prompts

GEPA can be thought of as precomputing reasoning during optimization to produce a plan for future task instances. Here are examples of the detailed prompts GEPA discovers:

<table> <tr> <td colspan="2" align="center">Example GEPA Prompts</td> </tr> <tr> <td align="center">HotpotQA (multi-hop QA) Prompt</td> <td align="center">AIME Prompt</td> </tr> <tr> <td width="52%" valign="top"> <img src="https://raw.githubusercontent.com/gepa-ai/gepa/refs/heads/main/assets/gepa_prompt_hotpotqa.png" alt="HotpotQA Prompt" width="1400"> <!-- <td> --> <details> <summary><mark>Click to view full HotpotQA prompt</mark><

Related Skills

View on GitHub
GitHub Stars3.0k
CategoryDevelopment
Updated2h ago
Forks256

Languages

Jupyter Notebook

Security Score

95/100

Audited on Mar 28, 2026

No findings