AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration

Generate Convert Improve

Install / Use

/learn @Eladlev/AutoPrompt

About this skill

Quality Score

0/100

README

<a href="https://discord.gg/G2rSbAf8uP"><img src="https://img.shields.io/badge/Join-Discord-blue.svg"/></a>  <a href="https://github.com/Eladlev/AutoPrompt/blob/main/LICENSE"> <img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"></a>

📝 AutoPrompt

Auto Prompt is a prompt optimization framework designed to enhance and perfect your prompts for real-world use cases.

The framework automatically generates high-quality, detailed prompts tailored to user intentions. It employs a refinement (calibration) process, where it iteratively builds a dataset of challenging edge cases and optimizes the prompt accordingly. This approach not only reduces manual effort in prompt engineering but also effectively addresses common issues such as prompt sensitivity and inherent prompt ambiguity issues.

Our mission: Empower users to produce high-quality robust prompts using the power of large language models (LLMs).

Why Auto Prompt?

Prompt Engineering Challenges. The quality of LLMs greatly depends on the prompts used. Even minor changes can significantly affect their performance.
Benchmarking Challenges. Creating a benchmark for production-grade prompts is often labour-intensive and time-consuming.
Reliable Prompts. Auto Prompt generates robust high-quality prompts, offering measured accuracy and performance enhancement using minimal data and annotation steps.
Modularity and Adaptability. With modularity at its core, Auto Prompt integrates seamlessly with popular open-source tools such as LangChain, Wandb, and Argilla, and can be adapted for a variety of tasks, including data synthesis and prompt migration.

System Overview

The system is designed for real-world scenarios, such as moderation tasks, which are often challenged by imbalanced data distributions. The system implements the Intent-based Prompt Calibration method. The process begins with a user-provided initial prompt and task description, optionally including user examples. The refinement process iteratively generates diverse samples, annotates them via user/LLM, and evaluates prompt performance, after which an LLM suggests an improved prompt.

The optimization process can be extended to content generation tasks by first devising a ranker prompt and then performing the prompt optimization with this learned ranker. The optimization concludes upon reaching the budget or iteration limit.

This joint synthetic data generation and prompt optimization approach outperform traditional methods while requiring minimal data and iterations. Learn more in our paper Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases by E. Levi et al. (2024).

Using GPT-4 Turbo, this optimization typically completes in just a few minutes at a cost of under $1. To manage costs associated with GPT-4 LLM's token usage, the framework enables users to set a budget limit for optimization, in USD or token count, configured as illustrated here.

Demo

pipeline_recording

📖 Documentation

How to install (Setup instructions)
Prompt optimization examples (Use cases: movie review classification, generation, and chat moderation)
How it works (Explanation of pipelines)
Architecture guide (Overview of main components)

Features

📝 Boosts prompt quality with a minimal amount of data and annotation steps.
🛬 Designed for production use cases like moderation, multi-label classification, and content generation.
⚙️ Enables seamless migrating of prompts across model versions or LLM providers.
🎓 Supports prompt squeezing. Combine multiple rules into a single efficient prompt.

QuickStart

AutoPrompt requires python <= 3.10

Step 1 - Download the project

git clone git@github.com:Eladlev/AutoPrompt.git
cd AutoPrompt

Step 2 - Install dependencies

Use either Conda or pip, depending on your preference. Using Conda:

conda env create -f environment_dev.yml
conda activate AutoPrompt

Using pip:

pip install -r requirements.txt

Using pipenv:

pip install pipenv
pipenv sync

Step 3 - Configure your LLM.

Set your OpenAI API key by updating the configuration file config/llm_env.yml

If you need help locating your API key, visit this link.
We recommend using OpenAI's GPT-4 for the LLM. Our framework also supports other providers and open-source models, as discussed here.

Step 4 - Configure your Annotator

Select an annotation approach for your project:
- We recommend beginning with a human-in-the-loop method, utilizing Argilla. Observe that AutoPrompt is compatible with Argilla V1, not with the latest V2. Follow the Argilla setup instructions, with the following modifications:
  - If you are using local docker use v1.29.0 instead of the latest tag.
  - For a quick setup using HF, duplicate the following space
- Alternatively, you can set up an LLM as your annotator by following these configuration steps.
The default predictor LLM, GPT-3.5, for estimating prompt performance, is configured in the predictor section of config/config_default.yml.
Define your budget in the input config yaml file using the max_usage parameter. For OpenAI models, max_usage sets the maximum spend in USD. For other LLMs, it limits the maximum token count.

Step 5 - Run the pipeline

First, configure your labels by editing config/config_default.yml

dataset:
    label_schema: ["Yes", "No"]

For a classification pipeline, use the following command from your terminal within the appropriate working directory:

python run_pipeline.py

If the initial prompt and task description are not provided directly as input, you will be guided to provide these details. Alternatively, specify them as command-line arguments:

python run_pipeline.py \
    --prompt "Does this movie review contain a spoiler? answer Yes or No" \
    --task_description "Assistant is an expert classifier that will classify a movie review, and let the user know if it contains a spoiler for the reviewed movie or not." \
    --num_steps 30

You can track the optimization progress using the W&B dashboard, with setup instructions available here.

If you are using pipenv, be sure to activate the environment:

pipenv shell
python run_pipeline.py

or alternatively prefix your command with pipenv run:

pipenv run python run_pipeline.py

Generation pipeline

To run the generation pipeline, use the following example command:

python run_generation_pipeline.py \
    --prompt "Write a good and comprehensive movie review about a specific movie." \
    --task_description "Assistant is a large language model that is tasked with writing movie reviews."

For more information, refer to our generation task example.

Benchmark optimization (optimize-only mode)

If you already have an annotated dataset and want to skip sample generation and annotation, use the benchmark optimization script. This mode runs a pure optimization loop: predict → evaluate → refine.

Your dataset should be a CSV file with text and annotation columns:

text,annotation
"The movie was absolutely fantastic!",Yes
"Waste of time and money.",No

Run the optimization:

python run_benchmark_optimization.py \
    --dataset path/to/your_data.csv \
    --prompt "Is this movie review positive? Answer Yes or No." \
    --task_description "Classify movie reviews as positive or negative." \
    --labels Yes No \
    --num_steps 10 \
    --output results.json

Arguments:

--dataset (required): Path to CSV with text and annotation columns
--prompt: Initial prompt to optimize (interactive if omitted)
--task_description: Task description (interactive if omitted)
--labels: Label schema (default: Yes No)
--num_steps: Number of optimization iterations (default: 10)
--output: Output JSON file for results (default: benchmark_results.json)
--config: Configuration file (default: config/config_benchmark.yml)

This is useful when:

You already have labeled benchmark data
You want faster iteration without sample generation
You're fine-tuning a prompt for a specific dataset

Enjoy the results. Completion of these steps yields a refined (calibrated) prompt tailored for your task, alongside a benchmark featuring challenging samples, stored in the default dump path.

Tips

Prompt accuracy may fluctuate during the optimization. To identify the best prompts, we recommend continuous refinement following the initial generation of the benchmark. Set the number of optimization iterations with --num_steps and control sample generation by specifying max_samples i

Related Skills

node-connect

337.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

337.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.2k

Commit, push, and open a PR