RepLDM
[NeurIPS2025 spotlight★] Official implementation for "RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation".
Install / Use
/learn @kmittle/RepLDMREADME
🔥🔥🔥 RepLDM is a training-free method for higher-resolution image generation, enabling the 8k image generation! You can freely adjust the richness of colors and details in the generated image through attention guidance.
<!-- here are urls --> <div align="center"> <!-- arxiv --> <a href="https://arxiv.org/abs/2410.06055"><img src="https://img.shields.io/badge/arXiv-2410.06055-B31B1B.svg" alt="arXiv Paper"></a> <!-- project page --> <a href='https://kmittle.github.io/project_pages/RepLDM/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <!-- hugging face --> <a href='#'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <!-- authors --><br>Boyuan Cao, Jiaxin Ye, Yujie Wei, and Hongming Shan* <br>(* Corresponding Author) <br>From Fudan University <br> <img src="fig/teaser.png" width="100%">
</div> <!-- authors end --> <!-- urls end -->📝 TODO List
- SDXL Based
- Text to Image
- [ √ ] RepLDM
- [ √ ] FreeScale + AttentionGuidance
- +ControlNet
- [ √ ] RepLDM
- [___] FreeScale + AttentionGuidance
- Text to Image
- FLUX Based
- Text to Image
- [___] RepLDM
- Text to Image
- SD3 Based
- Text to Image
- [___] RepLDM
- Text to Image
- [___] Web UI
⚙️ Setup
Install Environment
conda create -n repldm python=3.9
conda activate repldm
pip install -e .
🚀 Quik Start
Quick start with Gradio
TODO
Text to image generation
TODO
📖 Overview of RepLDM
RepLDM enables the rapid synthesis of high-quality, high-resolution images without the need for further training.
It consists of two stages:
- Synthesizing high-quality images at the training resolution using Attention Guidance.
- Generating finer high-resolution images through pixel upsampling and "diffusion-denoising" loop.
Attention Guidance enables the generation of images with more vivid colors and richer details, as shown in the figure below.
<p align="center"> <img src="fig/ablation_T2I.png" width="100%"> </p>Attention Guidance can be used in conjunction with plugins such as ControlNet to achieve an enhanced visual experience, as illustrated in the figure below.
<p align="center"> <img src="fig/ablation_controlnet.png" width="100%"> </p>Attention Guidance allows users to freely adjust the level of detail and color richness in an image according to their preferences, simply by modifying the attention guidance scale, as shown in the figure below.
<p align="center">
<img src="fig/attn_guidance_scale_ablation.png" width="100%">
</p>
<!-- **Attention Guidance**: Enhances the structural consistency of the latent representation using a training-free self-attention mechanism. -->
How does attention guidance work?
Attention Guidance computes layout-enhanced representations using a training-free self-attention (TFSA) mechanism and leverages them to strengthen layout consistency:
$\tilde{\boldsymbol{z}} = \gamma\mathrm{TFSA}(\boldsymbol{z})+(1-\gamma) \boldsymbol{z}, \quad \mathrm{TFSA}(\boldsymbol{z}) = \mathrm{f}^{-1}\left(\mathrm{Softmax}\left(\frac{\mathrm{f}(\boldsymbol{z}) \mathrm{f}(\boldsymbol{z})^{\mathrm{T}}}{\lambda}\right) \mathrm{f}(\boldsymbol{z})\right),$
where $\boldsymbol{z}$ is the latent representation, $\mathrm{f}$ denotes reshape operation, and 𝛾 and 𝜆 are hyperparameters. Specifically, Attention Guidance leads each denoising step closer to the final state, as illustrated in the figure below.
<p align="center"> <img src="fig/attn_guidance_analyze.png" width="100%"> </p>🔬 On Research Comparison
The implementation in the main branch includes some modifications based on the original version. If you want to compare with the original method reported in the paper, please refer to the code in the base branch.
😉 Citation
@inproceedings{caorepldm,
title={RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation},
author={Cao, Boyuan and Ye, Jiaxin and Wei, Yujie and Shan, Hongming},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}
Related Skills
docs-writer
99.2k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
336.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
zola-ai
An autonomous Solana wallet agent that executes payments via Twitter mentions and an in-app dashboard, powered by Claude.
