SkillAgentSearch skills...

EvolKit

EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Models (LLMs).

Install / Use

/learn @arcee-ai/EvolKit
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

EvolKit

EvolKit is an framework for automatically enhancing the complexity of instructions used in fine-tuning Large Language Models (LLMs). Our project aims to revolutionize the evolution process by leveraging open-source LLMs, moving away from closed-source alternatives.

Key Features

  • Automatic instruction complexity enhancement
  • Integration with open-source LLMs
  • Streamlined fine-tuning process
  • Support for various datasets from Hugging Face
  • Flexible configuration options for optimization

Installation

To set up EvolKit, follow these steps:

  1. Clone the repository:

    git clone https://github.com/arcee-ai/EvolKit.git
    cd EvolKit
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    

Usage

To run the AutoEvol script, use the following command structure:

python run_evol.py --dataset <dataset_name> [options]

Required Parameters:

  • --dataset <dataset_name>: The name of the dataset on Hugging Face to use.
  • --model <model_name>: Model to use for evolving instructions.
  • --generator <generator_type>: Type of generator to use ('openrouter' or 'vllm').
  • --batch_size <int>: Number of instructions to process in each batch.
  • --num_methods <int>: Number of evolution methods to use.
  • --max_concurrent_batches <int>: Maximum number of batches to process concurrently (in our experiment, a cluster of 8xH100 hosting Qwen2-72B-Instruct-GPTQ-Int8 can handle batch size of 50 concurrently).
  • --evolve_epoch <int>: Maximum number of epochs for evolving each instruction.
  • --output_file <filename>: Name of the output file to save results.

Optional Parameters:

  • --dev_set_size <int>: Number of samples to use in the development set. Use -1 for no devset. Default is -1. (We do not recommend using a dev set since it will take much more time to finish each round)
  • --use_reward_model: Flag to use a reward model for evaluation. No value required.

Models

We found 2 models that work very well with this pipeline:

  • Qwen2-72B-Instruct and DeepSeek-V2.5 (GPTQ and AWQ versions are fine too).
  • Other models might work but it has to be very good at generating structured content (in order to parse using parsing operations)

VLLM Support

To use VLLM as the backend, set the VLLM_BACKEND environment variable:

export VLLM_BACKEND=http://your-vllm-backend-url:port/v1

If not set, it will default to 'http://localhost:8000/v1'.

Example Usage:

To run AutoEvol on the 'small_tomb' dataset with custom parameters:

python run_evol.py --dataset qnguyen3/small_tomb --model Qwen/Qwen2-72B-Instruct-GPTQ-Int8 --generator vllm --batch_size 100 --num_methods 3 --max_concurrent_batches 10 --evolve_epoch 3 --output_file the_tomb_evolved-3e-batch100.json --dev_set_size 5 --use_reward_model

This command will:

  1. Load the 'qnguyen3/small_tomb' dataset from Hugging Face.
  2. Use the Qwen2-72B-Instruct model with VLLM as the generator.
  3. Process samples in batches of 100.
  4. Apply 3 evolution methods for each instruction.
  5. Process 10 batches concurrently.
  6. Evolve each instruction for up to 3 epochs.
  7. Use 5 samples for the development set.
  8. Use the reward model for evaluation.
  9. Output the final evolved instructions to the_tomb_evolved-3e-batch100.json.

After evolving the instructions, you can generate answers using:

python gen_answers.py --model Qwen/Qwen2-72B-Instruct-GPTQ-Int8 --generator vllm --data_path the_tomb_evolved-3e-batch100.json --batch_size 50 --output completed_evol_data.json

The final dataset will be saved to completed_evol_data.json in ShareGPT format.

Components

EvolKit consists of several key components:

  • Generator: Uses an LLM for generating initial instructions (OpenRouter or VLLM).
  • Evolver: Employs a recurrent evolution strategy.
  • Analyzer: Utilizes trajectory analysis.
  • Evaluator: Offers two options:
    • Reward Model Evaluator
    • Failure Detector Evaluator (originally from WizardLM's paper)
  • Optimizer: Optimizes the evolution method for the next round.

Output

The script saves the results in JSON format to the specified output file. Each entry in the JSON file represents an evolved instruction along with relevant metadata.

Find a 20k subset of a dataset generated using EvolKit here

Acknowledgement

  • Microsoft's WizardLM team for the inspiration from the AutoEvol paper.

Related Skills

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

2.1k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

openpencil

2.1k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

ui-ux-designer

Use this agent when you need to design, implement, or improve user interface components and user experience flows. Examples include: creating new pages or components, improving existing UI layouts, implementing responsive designs, optimizing user interactions, building forms or dashboards, analyzing existing UI through browser snapshots, or when you need to ensure UI components follow design system standards and shadcn/ui best practices.\n\n<example>\nContext: User needs to create a new dashboard page for team management.\nuser: "I need to create a team management dashboard where users can view team members, invite new members, and manage roles"\nassistant: "I'll use the ui-ux-designer agent to design and implement this dashboard with proper UX considerations, using shadcn/ui components and our design system tokens."\n</example>\n\n<example>\nContext: User wants to improve the user experience of an existing form.\nuser: "The signup form feels clunky and users are dropping off. Can you improve it?"\nassistant: "Let me use the ui-ux-designer agent to analyze the current form UX and implement improvements using our design system and shadcn/ui components."\n</example>\n\n<example>\nContext: User wants to evaluate and improve existing UI.\nuser: "Can you take a look at our pricing page and see how we can make it more appealing and user-friendly?"\nassistant: "I'll use the ui-ux-designer agent to take a snapshot of the current pricing page, analyze the UX against Notion-inspired design principles, and implement improvements using our design tokens."\n</example>

View on GitHub
GitHub Stars256
CategoryDesign
Updated4d ago
Forks27

Languages

Jupyter Notebook

Security Score

95/100

Audited on Apr 4, 2026

No findings