PromptWizard
Task-Aware Agent-driven Prompt Optimization Framework
Install / Use
/learn @microsoft/PromptWizardREADME
PromptWizard 🧙
<p align="left"> <a href='https://arxiv.org/abs/2405.18369'> <img src=https://img.shields.io/badge/arXiv-2409.10566-b31b1b.svg> </a> <a href='https://www.microsoft.com/en-us/research/blog/promptwizard-the-future-of-prompt-optimization-through-feedback-driven-self-evolving-prompts/'> <img src=images/msr_blog.png width="16"> Blog Post </a> <a href='https://microsoft.github.io/PromptWizard/'> <img src=images/github.png width="16"> Project Website </a> </p>PromptWizard: Task-Aware Prompt Optimization Framework<br> Eshaan Agarwal, Joykirat Singh, Vivek Dani, Raghav Magazine, Tanuja Ganu, Akshay Nambi <br>
Overview 🌟
<p align="center">Overview of the PromptWizard framework</p> <img src="./images/overview.png" >PromptWizard is a discrete prompt optimization framework that employs a self-evolving mechanism where the LLM generates, critiques, and refines its own prompts and examples, continuously improving through iterative feedback and synthesis. This self-adaptive approach ensures holistic optimization by evolving both the instructions and in-context learning examples for better task performance.
Three key components of PromptWizard are the following :
- Feedback-driven Refinement: LLM generates, critiques, and refines its own prompts and examples, continuously improving through iterative feedback and synthesis
- Critique and Synthesize diverse examples: Generates synthetic examples that are robust, diverse and task-aware. Also it optimizes both prompt and examples in tandem
- Self generated Chain of Thought (CoT) steps with combination of positive, negative and synthetic examples
Installation ⬇️
Follow these steps to set up the development environment and install the package:
-
Clone the repository
git clone https://github.com/microsoft/PromptWizard cd PromptWizard -
Create and activate a virtual environment
On Windows
python -m venv venv venv\Scripts\activateOn macOS/Linux:
python -m venv venv source venv/bin/activate -
Install the package in development mode:
pip install -e .
Quickstart 🏃
There are three main ways to use PromptWizard:
- Scenario 1 : Optimizing prompts without examples
- Scenario 2 : Generating synthetic examples and using them to optimize prompts
- Scenario 3 : Optimizing prompts with training data
NOTE : Refer this notebook to get a detailed understanding of the usage for each of the scenarios. This serves as a starting point to understand the usage of PromptWizard
High level overview of using PromptWizard
- Decide your scenario
- Fix the configuration and environmental varibles for API calling
- Use
promptopt_config.yamlto set configurations. For example for GSM8k this file can be used - Use
.envto set environmental varibles. For GSM8k this file can be used
USE_OPENAI_API_KEY="XXXX" # Replace with True/False based on whether or not to use OPENAI API key # If the first variable is set to True then fill the following two OPENAI_API_KEY="XXXX" OPENAI_MODEL_NAME ="XXXX" # If the first variable is set to False then fill the following three AZURE_OPENAI_ENDPOINT="XXXXX" # Replace with your Azure OpenAI Endpoint OPENAI_API_VERSION="XXXX" # Replace with the version of your API AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="XXXXX" # Create a deployment for the model and place the deployment name here. - Use
- Run the code
- To run PromptWizard on your custom dataset please jump here
Running PromptWizard with training data (Scenario 3)
- We support GSM8k, SVAMP, AQUARAT and Instruction_Induction(BBII) datasets
- Please note that time taken for prompt optimzation is dependent on the dataset. In our experiments for the above mentioned datasets, it took around 20 - 30 minutes on average.
Running on GSM8k (AQUARAT/SVAMP)
- Please note that this code requires access to LLMs via API calling for which we support AZURE endpoints or OPENAI keys
- Set the AZURE endpoint configurations in .env
- Follow the steps in demo.ipynb to download the data, run the prompt optimization and carry out inference.
Running on BBII
- BBII has many datasets in it, based on the dataset set the configs here
- In configs
task_description,base_instructionandanswer_formatneed to be changed for different datasets in BBII, the rest of the configs remain the same - A demo is presented in demo.ipynb
Run on Custom Datasets 🗃️
Create Custom Dataset
- Our code expects the dataset to be in
.jsonlfile format - Both the train and test set follow the same format
- Every sample in the
.jsonlshould have 2 fields :question: It should contain the complete question that is to asked to the LLManswer: It should contain the ground truth answer which can be verbose or concise
Run on Custom Dataset
NOTE : Refer to demos folder for examples of folders for four datasets. The .ipynb in each of the folders shows how to run PromptWizard on that particular dataset. A similar procedure can be followed for a new dataset. Below is the explanation of each of the components of the .ipynb and the dataset specifc folder structure in detail
Steps to be followed for custom datasets
-
Every new dataset needs to have the following
configsfolder to store files for defining optimization hyperparameters and setup configsdatafolder to storetrain.jsonlandtest.jsonlas curated here (this is done in the notebooks).envfile for environment varibles to be used for API calling.py/.ipynbscript to run the code
-
Set the hyperparameters like number of mutations, refine steps, in-context examples etc.
- Set the following in promptopt_config.yaml :
-
task_description: Desciption of the task at hand which will be fed into the prompt- For GSM8k a description like the following can be used
You are a mathematics expert. You will be given a mathematics problem which you need to solve
- For GSM8k a description like the following can be used
-
base_instruction: Base instruction in line with the dataset- A commonly used base instruction could be
Lets think step by step.
- A commonly used base instruction could be
-
answer_format: Instruction for specifying the answer format- It is crucial to set the
answer_formatproperly to ensure correct extraction bydef extract_final_answer() - Answer format could be :
Then inAt the end, wrap only your final option between <ANS_START> and <ANS_END> tagsdef extract_final_answer()we can simply write code to extract string between the tags
- It is crucial to set the
-
seen_set_size: The number of train samples to be used for prompt optimization- In our experiments we set this to be 25. In general any number between 20-50 would work
-
few_shot_count: The number of in-context examples needed in the prompt- The value can be set to any positive integer based on the requirement
- For generating zero-shot prompts, set the values to a small number (i.e between 2-5) and after the final prompt is generated the in-context examples can be removed. We suggest using some in-context examples as during the optimization process the instructions in the prompt are refined using in-context examples hence setting it to a small number will give better zero-shot instructions in the prompt
-
generate_reasoning: Whether or not to generate reasoning for the in-context examples- In our experiments we found it to improve the prompt overall as it provides a step-by-step approach to reach the final answer. However if there is a constraint on the prompt length or number of prompt tokens, it can be turned off to get smaller sized prompts
-
generate_expert_identityandgenerate_intent_keywords: Having these helped improve the prompt as they help making the prompt relevant to the task
-
- Refer
promptopt_config.yamlfiles in folders present here for the descriptions used for AQUARAT, SVAMP and GSM8k. For BBII refer description.py which has the meta instructions for each of the datasets - Following are the global parameters which can be set based on the availability of the training data
run_without_train_examplesis a global hyperparameter which can be used when there are no training samples and in-context examples are not required in the final promptgenerate_synthetic_examplesis a global hyperparameter which can be used when there are no training samples and we want to generate synthetic data for traininguse_examplesis a global hyperparameter which can be used to optimize prompts using training data
- Set the following in promptopt_config.yaml :
-
Create a da
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
