Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences (ICML 2025)

arXiv Paper | Project Page | DDIM-InPO Project Page

The repository provides the official implementation, experiment code, and model checkpoints used in our research paper.

📖 News & Updates

[2025-06-03] 🎉 Preprint paper released on arXiv!
[2025-06-03] ✅ Initial model checkpoints published
[2025-06-04] 📊 Project page
[2025-06-29] 🚀 Training code release

🔧 Quick Start

Installation

conda create -n smpo python=3.10
conda activate smpo
git clone https://github.com/JaydenLyh/SmPO.git
cd SmPO
pip install -r requirements.txt

Preparation of dataset and base models

SmPO/
├── assets/                   
│   └── smpo.png            
├── checkpoints/    
│   ├── CLIP-ViT-H-14-laion2B-s32B-b79K/  
│   ├── PickScore_v1/          
│   ├── stable-diffusion-v1-5/          
│   ├── sdxl-vae-fp16-fix/            
│   └── stable-diffusion-xl-base-1.0/         
├── datasets/                 
│   └── pickapic_v2/   
├── utils/  
│   └── pickscore_utils.py  
├── train.py      
├── preprocessing.py        
├── README.md              
├── LICENSE.txt            
└── requirements.txt

Step 1: Smooth Pick-a-Pic v2

python preprocessing.py

Step 2: Training for SDXL

export MODEL_NAME="checkpoints/stable-diffusion-xl-base-1.0"
export VAE="checkpoints/sdxl-vae-fp16-fix"
export DATASET_NAME="pickapic_v2"
PORT=$((20000 + RANDOM % 10000))

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" accelerate launch --main_process_port $PORT --mixed_precision="fp16" --num_processes=8 train.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE \
  --dataset_name=$DATASET_NAME \
  --train_batch_size=1 \
  --dataloader_num_workers=16 \
  --gradient_accumulation_steps=128 \
  --max_train_steps=200 \
  --lr_scheduler="constant_with_warmup" --lr_warmup_steps=100 \
  --learning_rate=1e-8 --scale_lr \
  --checkpointing_steps 50 \
  --beta_dpo 5000 \
  --sdxl  \
  --output_dir="smpo-sdxl"

Our Models

| Model | Download Links
|----------------|-----------------------------------------| | SmPO-SD1.5 | Hugging Face | | SmPO-SDXL | Hugging Face |

Citation

@article{lu2025smoothed,
  title={Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences},
  author={Lu, Yunhong and Wang, Qichao and Cao, Hengyuan and Xu, Xiaoyin and Zhang, Min},
  journal={arXiv preprint arXiv:2506.02698},
  year={2025}
}

Acknowledgments

The implementation of this project references the DiffusionDPO repository by Salesforce AI Research. We acknowledge and appreciate their open-source contribution.

SmPO

Install / Use

README