VellumForge2
VellumForge2 is a Golang CLI for generating high-quality Direct Preference Optimization datasets via a hierarchical prompt pipeline with optional LLM-as-a-Judge scoring. This tool supports OpenAI-compatible APIs, smart rate limiting, concurrent workers, and one-command Hugging Face Hub uploads.
Install / Use
/learn @lemon07r/VellumForge2README
VellumForge2
High-performance synthetic dataset generator for LLM training. Generates DPO, SFT, KTO, and MO-DPO datasets using any OpenAI-compatible API with hierarchical generation pipeline, checkpoint/resume support, and optional LLM-as-Judge evaluation.
./bin/vellumforge2 run --config config.toml
Features
Multiple Dataset Formats
- SFT - Simple instruction-output pairs for supervised fine-tuning
- DPO - Standard preference pairs (prompt, chosen, rejected) compatible with HuggingFace TRL
- KTO - Unpaired preferences with binary labels compatible with HuggingFace TRL KTOTrainer
- MO-DPO - Full multi-objective DPO with detailed judge scoring for reward modeling
High Performance
- Concurrent worker pool supporting up to 1024 parallel requests or more
- Provider-level and per-model rate limiting with configurable burst capacity
- Checkpoint/resume for interrupted sessions
- Asynchronous judge evaluation (non-blocking)
- Smart over-generation strategy achieving 95%+ count accuracy
- Robust 4-strategy JSON parsing with 99%+ success rate
Provider Agnostic
Works with any OpenAI-compatible API: OpenAI, NVIDIA NIM, Anthropic, Together AI, llama.cpp, Ollama, LM Studio, kobold.cpp, vLLM, and more.
Configurable Pipeline
- Hierarchical generation: Main topic → Subtopics → Prompts → Preference pairs
- Custom prompt templates at every stage
- Optional LLM-as-Judge quality filtering (40-60% token savings vs full evaluation)
- Flexible rate limiting strategies
Hugging Face Integration
One-command dataset uploads with automatic repository creation using native NDJSON commit API (no external dependancies like HF CLI required).
Installation
Prebuilt Binaries
Download from releases page for Linux, macOS (x86_64/ARM64), and Windows.
From Source
git clone https://github.com/lemon07r/vellumforge2.git
cd vellumforge2
make build
# Binary at ./bin/vellumforge2
Quick Start
# 1. Copy configuration template
cp configs/config.example.toml config.toml
cp configs/.env.example .env
# 2. Edit .env with your API keys
# NVIDIA_API_KEY=nvapi-your-key
# OPENAI_API_KEY=sk-your-key
# 3. Edit config.toml with your settings
# Choose dataset_mode, configure models, customize prompts
# 4. Generate dataset
./bin/vellumforge2 run --config config.toml
# 5. Results in output/session_YYYY-MM-DDTHH-MM-SS/dataset.jsonl
See GETTING_STARTED.md for step-by-step tutorial.
Configuration
Minimal configuration:
[generation]
main_topic = "Fantasy Fiction"
num_subtopics = 64
num_prompts_per_subtopic = 2
concurrency = 64
dataset_mode = "dpo" # Options: sft, dpo, kto, mo-dpo
[models.main]
base_url = "https://integrate.api.nvidia.com/v1"
model_name = "moonshotai/kimi-k2-instruct-0905"
temperature = 0.6
max_output_tokens = 8192
rate_limit_per_minute = 40
[models.rejected] # Required for DPO/KTO/MO-DPO
base_url = "http://localhost:8080/v1" # Default URL for llama.cpp local server, but you can use any api of choice
model_name = "phi-4-mini-instruct"
temperature = 0.0
max_output_tokens = 4096
[prompt_templates]
chosen_generation = "Write a compelling story (400-600 words): {{.Prompt}}"
rejected_generation = "Write a simple story (200-300 words): {{.Prompt}}"
Complete configuration reference in configs/config.example.toml.
Dataset Modes
Mode Selection
| Mode | Output Format | Models Required | HuggingFace TRL | Use Case | |------|--------------|-----------------|-----------------|----------| | sft | instruction → output | Main only | SFTTrainer | Basic fine-tuning | | dpo | prompt, chosen, rejected | Main + Rejected | DPOTrainer | Preference optimization | | kto | prompt, completion, label | Main + Rejected | KTOTrainer | Unpaired preferences | | mo-dpo | Full schema + judge scores | Main + Rejected + Judge | Custom | Multi-objective training |
Example Outputs
SFT Format:
{
"instruction": "Write a fantasy story about dragons",
"output": "In the mountains of Eldoria..."
}
DPO Format:
{
"prompt": "Write a fantasy story about dragons",
"chosen": "In the ancient mountains of Eldoria, where mist...",
"rejected": "There was a dragon. It was big..."
}
KTO Format (2 rows per pair):
{"prompt": "Write about dragons", "completion": "Good story...", "label": true}
{"prompt": "Write about dragons", "completion": "Bad story...", "label": false}
MO-DPO Format:
{
"prompt": "Write a fantasy story about dragons",
"chosen": "In the mountains...",
"rejected": "There was a dragon...",
"chosen_scores": {
"plot_quality": {"score": 5, "reasoning": "Excellent narrative..."},
"creativity": {"score": 4, "reasoning": "Fresh perspective..."}
},
"rejected_scores": {
"plot_quality": {"score": 2, "reasoning": "Minimal development..."},
"creativity": {"score": 2, "reasoning": "Generic treatment..."}
},
"chosen_score_total": 4.5,
"rejected_score_total": 2.0,
"preference_margin": 2.5
}
See DATASET_MODES.md for detailed format specifications and configuration examples.
Optional Judge Filtering
Available for SFT, DPO, KTO modes. MO-DPO always includes full judge evaluation.
[judge_filtering]
enabled = true
use_explanations = false # Scores only = 40-60% token savings
min_chosen_score = 4.0 # Keep chosen responses >= 4.0
max_rejected_score = 3.0 # Keep rejected responses <= 3.0
[models.judge]
enabled = true
base_url = "https://integrate.api.nvidia.com/v1"
model_name = "meta/llama-3.1-70b-instruct"
temperature = 0.4
max_output_tokens = 2048
Filters responses before writing to dataset based on quality scores. Use when API budget is limited or training time is expensive.
Rate Limiting
Provider-Level Limits
Global rate limits shared across all models from same provider:
[provider_rate_limits]
nvidia = 40 # All NVIDIA models share this 40 RPM limit
provider_burst_percent = 15 # 15% burst capacity (default)
Takes precedence over per-model limits. Prevents 429 errors when multiple models share one API endpoint.
Per-Model Limits
Individual model rate limiting:
[models.main]
rate_limit_per_minute = 40 # Overridden by provider_rate_limits if set
Optimization
Recommended configuration for high throughput:
[generation]
concurrency = 128 # Or 256 for high throughput, recommended to test with the bencmark scripts
[provider_rate_limits]
nvidia = 40
provider_burst_percent = 20 # Higher burst for better throughput
Conservative configuration for avoiding rate limits:
[generation]
concurrency = 32
[provider_rate_limits]
nvidia = 30
provider_burst_percent = 10 # Lower burst for fewer 429 errors
See BENCHMARK_README.md for benchmarking guide using our easy to use benchmark scripts.
Checkpoint & Resume
Enable automatic checkpointing:
[generation]
enable_checkpointing = true
checkpoint_interval = 10 # Save every 10 completed jobs
Resume interrupted session:
# List available checkpoints
./bin/vellumforge2 checkpoint list
# Inspect checkpoint
./bin/vellumforge2 checkpoint inspect session_2025-11-05T12-34-56
# Resume generation
./bin/vellumforge2 checkpoint resume session_2025-11-05T12-34-56
# Resume with specific config and env file
./bin/vellumforge2 checkpoint resume session_2025-11-05T12-34-56 \
--config config.sft.toml \
--env-file .env
Graceful shutdown with Ctrl+C automatically saves checkpoint.
CLI Commands
Generate Dataset
# Basic generation
./bin/vellumforge2 run --config config.toml
# With verbose logging
./bin/vellumforge2 run --config config.toml --verbose
# Upload to Hugging Face
./bin/vellumforge2 run --config config.toml \
--upload-to-hf --hf-repo-id username/my-dataset #--hf-repo-id not required if set in config file
Checkpoint Management
# List checkpoints
./bin/vellumforge2 checkpoint list
# Inspect checkpoint
./bin/vellumforge2 checkpoint inspect <session-dir>
# Resume from checkpoint
./bin/vellumforge2 checkpoint resume <session-dir>
# Resume with specific config (important if checkpoint used different config file)
./bin/vellumforge2 checkpoint resume <session-dir> --config config.sft.toml --env-file .env
Dataset Transform (SFT→DPO & Rejected Regeneration)
# Convert an existing SFT dataset into DPO (generates rejected responses)
./bin/vellumforge2 transform \
--config config.dpo.toml \
--mode sft-to-dpo \
--input path/to/sft_dataset.jsonl \
--output path/to/dpo_from_sft.jsonl
# Regenerate rejected responses for an existing DPO dataset
./bin/vellumforge2 transform \
--config config.dpo.toml \
--mode regen-rejected \
--input path/to/dpo_dataset.jsonl \
--output path/to/dpo_dataset.regen.jsonl
# Optional: checkpoint/resume for long transforms
./bin/vellumforge2 transform \
--config config.dpo.toml \
--mode regen-rejected \
--input path/to/dpo_dataset.jsonl \
--output path/to/dpo_dataset.regen.jsonl \
--checkpoint path/to/transform.checkpoint.json \
--resume
# Regenerate both plain and reasoning DPO datasets
./bin/vellumforge2 transform \
--config config.dpo.toml \
--mode regen-rejected \
--input path/to/dpo_dataset.jsonl \
--input-reasoning path/to/dpo_dataset_reasoning.jsonl \
--output path/to/dpo_dataset.regen.jsonl \
--output-reasoning path/to/dpo_dataset_reasoning.regen.jsonl
# Reasoning-only input: rebuild plain + reasoning datasets from reasoning JSONL
./bin/vellumforge2 transform \
--config config.dpo.toml \
--mode regen-rejected \
--input-reasoning path/to/dpo_dataset_reasoning.jsonl \
--output path/to/dpo_dataset_from_reasoning.jsonl \
--output-reasoning path/to/dpo_dataset_reasoning.regen.jsonl
Other
# Show version
./bin/vellu
Related Skills
openhue
325.9kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
325.9kElevenLabs text-to-speech with mac-style say UX.
weather
325.9kGet current weather and forecasts via wttr.in or Open-Meteo
index
Tego is a pluggable Node.js framework for building customizable development platforms. It enables developers to create their own no-code/low-code systems or event-driven applications, while the core focuses on stability and environment adaptability.
