Hybridiff

[CVPR 2026] Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Generate Convert Improve

Install / Use

/learn @kaist-dmlab/Hybridiff

About this skill

Quality Score

0/100

README

Hybridiff: Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Official implementation of the CVPR 2026 paper "Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling”

Hybridiff is a high-performance diffusion model inference framework that combines condition-based partitioning with adaptive parallelism switching for accelerated image generation. This repository implements dynamic threshold detection (tau1, tau2) to automatically optimize the trade-off between quality and speed during the diffusion inference.

Installation

1. Create Virtual Environment

# Create and activate conda environment
conda create -n hybridiff python=3.10
conda activate hybridiff

2. Install Dependencies

# Install required packages
pip install -r requirements.txt

Dataset Preparation

MS-COCO Dataset

For large-scale evaluation using MS-COCO dataset:

# The dataset will be automatically downloaded when running generation scripts
# Default cache location: /data/ms-coco
# You can modify this path in the generation scripts if needed

Note: The hardcoded dataset path in the generation scripts (generate_hybridiff_coco.py and generate_sd3_hybridiff_coco.py) may need to be updated based on your environment.

Quick Start: Single Image Generation

Stable Diffusion XL (SDXL)

# Run with 2 GPUs
torchrun --nproc_per_node=2 examples/run_sdxl_hybridiff.py \
  --model_n 2 --stride 1 \
  --prompt "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" \
  --L_slope 12 --eps_slope 0.0004 --k 5 --tau_cap 15

Stable Diffusion 3 (SD3)

# Run with 2 GPUs
torchrun --nproc_per_node=2 examples/run_sd3_hybridiff.py \
  --model_n 2 --stride 1 \
  --prompt "A cat holding a sign that says hello world" \
  --L_slope 15 --eps_slope 0.0005 --k 5 --tau_cap 40

COCO Dataset Generation

SDXL on COCO

# Generate images for COCO validation set
torchrun --nproc_per_node=2 examples/generate_hybridiff_coco.py \
  --model_n 2 --stride 1 \
  --num_inference_steps 50 \
  --guidance_scale 5.0 \
  --scheduler ddim \
  --L_slope 12 --eps_slope 0.0004 --k 5 --tau_cap 15

# Or use the provided shell script
bash examples/run_hybridiff_coco.sh

SD3 on COCO

# Generate images for COCO validation set
torchrun --nproc_per_node=2 examples/generate_sd3_hybridiff_coco.py \
  --model_n 2 --stride 1 \
  --num_inference_steps 50 \
  --guidance_scale 7.0 \
  --L_slope 15 --eps_slope 0.0005 --k 5 --tau_cap 40

# Or use the provided shell script
bash examples/run_sd3_hybridiff_coco.sh

Metric Evaluation

Compute LPIPS, PSNR, and FID metrics between generated and reference images:

python examples/compute_metric.py \
  --input_root0 path/to/reference/images \
  --input_root1 path/to/generated/images \
  --is_gt \
  --max_length 5000 \

Key Parameters

Model Configuration (Based on asyncdiff)

model_n: Number of GPUs to use for parallelization (2 or 4)
- For world_size=2: set model_n=2, stride=1
stride: Stride for pipeline parallelism (1 or 2) -> Based on asyncdiff
time_shift: Enable time-shifted inference (default: False) -> Based on asyncdiff

Hybridiff Threshold Parameters

L_slope: Window size for computing slope in tau1 detection
- SDXL default: 12
- SD3 default: 15
- Larger values result in smoother detection but slower response
eps_slope: Threshold for slope-based tau1 detection
- SDXL default: 0.0004 (0.4e-3)
- SD3 default: 0.0005 (0.5e-3)
- Lower values trigger tau1 earlier (more aggressive parallelism)
k: Offset for tau2 calculation (tau2 = tau1 + k)
- SDXL default: 5
- SD3 default: 5
- Determines the length of parallel execution phase
tau_cap: Safety fallback value for tau1 if automatic detection fails
- SDXL default: 15
- SD3 default: 40
- Ensures tau1 is set even if slope-based detection doesn't trigger

Generation Parameters

num_inference_steps: Number of denoising steps (default: 50)
guidance_scale: CFG guidance scale
- SDXL default: 5.0
- SD3 default: 7.0
image_size: Output image resolution (default: 1024)
scheduler: Noise scheduler type (for SDXL: "euler", "dpm-solver", or "ddim" -> default: "ddim")

Structure

hybridiff/
├── hybrid_sd.py              # HybriDiff implementation for SDXL
├── hybrid_sd3.py             # HybriDiff implementation for SD3
├── pipelines/
│   ├── __init__.py
│   ├── pipeline_stable_diffusion_xl.py     # Custom SDXL pipeline with CFG-DP
│   └── pipeline_stable_diffusion_3.py      # Custom SD3 pipeline with CFG-DP
├── metrics.py                # Memory and performance measurement utilities
├── tools.py                  # Helper utilities for result caching
└── pipe_config.py            # Model splitting configuration

examples/
├── run_sdxl_hybridiff.py     # Single image generation (SDXL)
├── run_sd3_hybridiff.py      # Single image generation (SD3)
├── generate_hybridiff_coco.py       # COCO dataset generation (SDXL)
├── generate_sd3_hybridiff_coco.py   # COCO dataset generation (SD3)
├── compute_metric.py         # Metric evaluation script
└── *.sh                      # Shell scripts for batch execution

License

This project is licensed under the Apache License 2.0 - see the original Diffusers license for details.

Acknowledgments

Built on top of Hugging Face Diffusers
Inspired by AsyncDiff and other parallel diffusion inference works
Thanks to the Stable Diffusion community for pre-trained models

Citation

If you use this research, please cite our paper:

@inproceedings{jung2026hybridiff,
  title={Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling},
  author={Euisoo Jung, Byunghyun Kim, Hyunjin Kim, Seonghye Cho and Jae-Gil Lee},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Related Skills

node-connect

353.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。