SimpleTuner
A general fine-tuning kit geared toward image/video/audio diffusion models.
Install / Use
/learn @bghira/SimpleTunerREADME
SimpleTuner 💹
ℹ️ No data is sent to any third parties except through opt-in flag
report_to,push_to_hub, or webhooks which must be manually configured.
SimpleTuner is geared towards simplicity, with a focus on making the code easily understood. This codebase serves as a shared academic exercise, and contributions are welcome.
If you'd like to join our community, we can be found on Discord via Terminus Research Group. If you have any questions, please feel free to reach out to us there.
<img width="1944" height="1657" alt="image" src="https://github.com/user-attachments/assets/af3a24ec-7347-4ddf-8edf-99818a246de1" />Table of Contents
Design Philosophy
- Simplicity: Aiming to have good default settings for most use cases, so less tinkering is required.
- Versatility: Designed to handle a wide range of image quantities - from small datasets to extensive collections.
- Cutting-Edge Features: Only incorporates features that have proven efficacy, avoiding the addition of untested options.
Tutorial
Please fully explore this README before embarking on the new web UI tutorial or the class command-line tutorial, as this document contains vital information that you might need to know first.
For a manually configured quick start without reading the full documentation or using any web interfaces, you can use the Quick Start guide.
For memory-constrained systems, see the DeepSpeed document which explains how to use 🤗Accelerate to configure Microsoft's DeepSpeed for optimiser state offload. For DTensor-based sharding and context parallelism, read the FSDP2 guide which covers the new FullyShardedDataParallel v2 workflow inside SimpleTuner.
For multi-node distributed training, this guide will help tweak the configurations from the INSTALL and Quickstart guides to be suitable for multi-node training, and optimising for image datasets numbering in the billions of samples.
Features
SimpleTuner provides comprehensive training support across multiple diffusion model architectures with consistent feature availability:
Core Training Features
- User-friendly web UI - Manage your entire training lifecycle through a sleek dashboard
- Multi-modal training - Unified pipeline for Image, Video, and Audio generative models
- Multi-GPU training - Distributed training across multiple GPUs with automatic optimization
- Advanced caching - Image, video, audio, and caption embeddings cached to disk for faster training
- Aspect bucketing - Support for varied image/video sizes and aspect ratios
- Concept sliders - Slider-friendly targeting for LoRA/LyCORIS/full (via LyCORIS
full) with positive/negative/neutral sampling and per-prompt strength; see Slider LoRA guide - Memory optimization - Most models trainable on 24G GPU, many on 16G with optimizations
- DeepSpeed & FSDP2 integration - Train large models on smaller GPUs with optim/grad/parameter sharding, context parallel attention, gradient checkpointing, and optimizer state offload
- S3 training - Train directly from cloud storage (Cloudflare R2, Wasabi S3)
- EMA support - Exponential moving average weights for improved stability and quality
- Custom experiment trackers - Drop an
accelerate.GeneralTrackerintosimpletuner/custom-trackersand use--report_to=custom-tracker --custom_tracker=<name>
Multi-User & Enterprise Features
SimpleTuner includes a complete multi-user training platform with enterprise-grade features—free and open source, forever.
- Worker Orchestration - Register distributed GPU workers that auto-connect to a central panel and receive job dispatch via SSE; supports ephemeral (cloud-launched) and persistent (always-on) workers; see Worker Orchestration Guide
- SSO Integration - Authenticate with LDAP/Active Directory or OIDC providers (Okta, Azure AD, Keycloak, Google); see External Auth Guide
- Role-Based Access Control - Four default roles (Viewer, Researcher, Lead, Admin) with 17+ granular permissions; define resource rules with glob patterns to restrict configs, hardware, or providers per team
- Organizations & Teams - Hierarchical multi-tenant structure with ceiling-based quotas; org limits enforce absolute maximums, team limits operate within org bounds
- Quotas & Spending Limits - Enforce cost ceilings (daily/monthly), job concurrency limits, and submission rate limits at org, team, or user scope; actions include block, warn, or require approval
- Job Queue with Priorities - Five priority levels (Low → Critical) with fair-share scheduling across teams, starvation prevention for long-waiting jobs, and admin priority overrides
- Approval Workflows - Configurable rules trigger approval for jobs exceeding cost thresholds, first-time users, or specific hardware requests; approve via UI, API, or email reply
- Email Notifications - SMTP/IMAP integration for job status, approval requests, quota warnings, and completion alerts
- API Keys & Scoped Permissions - Generate API keys with expiration and limited scope for CI/CD pipelines
- Audit Logging - Track all user actions with chain verification for compliance; see Audit Guide
For deployment details, see the Enterprise Guide.
Model Architecture Support
| Model | Parameters | PEFT LoRA | Lycoris | Full-Rank | ControlNet | Quantization | Flow Matching | Text Encoders | |-------|------------|-----------|---------|-----------|------------|--------------|---------------|---------------| | Stable Diffusion XL | 3.5B | ✓ | ✓ | ✓ | ✓ | int8/nf4 | ✗ | CLIP-L/G | | Stable Diffusion 3 | 2B-8B | ✓ | ✓ | ✓* | ✓ | int8/fp8/nf4 | ✓ | CLIP-L/G + T5-XXL | | Flux.1 | 12B | ✓ | ✓ | ✓* | ✓ | int8/fp8/nf4 | ✓ | CLIP-L + T5-XXL | | Flux.2 | 32B | ✓ | ✓ | ✓* | ✗ | int8/fp8/nf4 | ✓ | Mistral-3 Small | | ACE-Step | 3.5B | ✓ | ✓ | ✓* | ✗ | int8 | ✓ | UMT5 | | HeartMuLa | 3B | ✓ | ✓ | ✓* | ✗ | int8 | ✗ | None | | Chroma 1 | 8.9B | ✓ | ✓ | ✓* | ✗ | int8/fp8/nf4 | ✓ | T5-XXL | | Auraflow | 6.8B | ✓ | ✓ | ✓* | ✓ | int8/fp8/nf4 | ✓ | UMT5-XXL | | PixArt Sigma | 0.6B-0.9B | ✗ | ✓ | ✓ | ✓ | int8 | ✗ | T5-XXL | | Sana | 0.6B-4.8B | ✗ | ✓ | ✓ | ✗ | int8 | ✓ | Gemma2-2B | | Lumina2 | 2B | ✓ | ✓ | ✓ | ✗ | int8 | ✓ | Gemma2 | | Kwai Kolors | 5B | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ChatGLM-6B | | LTX Video | 5B | ✓ | ✓ | ✓ | ✗ | int8/fp8 | ✓ | T5-XXL | | LTX Video 2 | 19B | ✓ | ✓ | ✓* | ✗ | int8/fp8 | ✓ | Gemma3 | | Wan Video | 1.3B-14B | ✓ | ✓ | ✓* | ✗ | int8 | ✓ | UMT5 | | HiDream | 17B (8.5B MoE) | ✓ | ✓ | ✓* | ✓ | int8/fp8/nf4 | ✓ | CLIP-L + T5-XXL + Llama | | Cosmos2 | 2B-14B | ✗ | ✓ | ✓ | ✗ | int8 | ✓ | T5-XXL | | OmniGen | 3.8B | ✓ | ✓ | ✓ | ✗ | int8/fp8 | ✓ | T5-XXL | | Qwen Image | 20B | ✓ | ✓ | ✓* | ✗ | int8/nf4 (req.) | ✓ | T5-XXL | | SD 1.x/2.x (Legacy) | 0.9B | ✓ | ✓ | ✓ | ✓ | int8/nf4 | ✗ | CLIP-L |
✓ = Supported, ✗ = Not supported, * = Requires DeepSpeed for full-rank training
Advanced Training Techniques
- TREAD - Token-wise dropout for transformer models, including Kontext training
- Masked loss training - Superior convergence with segmentation/depth guidance
- Prior regularization - Enhanced training stability for character consistency
- Gradient checkpointing - Configurable intervals for memory/speed optimization
- Loss functions - L2, Huber, Smooth L1 with scheduling support
- SNR weighting - Min-SNR gamma weighting for improved training dynamics
- Group offloading - Diffusers v0.33+ module-group CPU/disk staging with optional CUDA streams
- Validation adapter sweeps - Temporarily attach LoRA adapters (single or JSON presets) during validation to measure adapter-only or comparison renders without touching the training loop
- External validation hooks - Swap the built-in validation pipeline or post-upload steps for your own scripts, so you can run checks on another GPU or forward artifacts to any cloud provider of your choice (details)
- CREPA regularization - Cross-frame representation alignment for video DiTs (guide)
- LoRA I/O formats - Load/save PEFT LoRAs in standard Diffusers layout or ComfyUI-style
diffusion_model.*keys (Flux/Flux2/Lumina2/Z-Image auto-detect ComfyUI inputs)
Model-Specific Features
- Flux Kontext - Edit conditioning and image-to-image training for Flux models
- PixArt two-stage - eDiff training pipeline support for PixArt Sigma
- Flow matching models - Advanced scheduling with beta/uniform distributions
- HiDream MoE - Mixture of Experts gate loss augmentation
- T5 masked training - Enhanced fine details for Flux and compatible models
- QKV fusion - Memory and speed optimizations (Flux, Lumina2)
- TREAD integration - Selective token routing for most models
- Wan 2.x I2V - High/low stage presets plus a 2.1 time-embedding fallback (see Wan quickstart)
- **Classif
Related Skills
docs-writer
99.1k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
336.2kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
