RepLDM

[NeurIPS2025 spotlight★] Official implementation for "RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation".

Generate Convert Improve

Install / Use

/learn @kmittle/RepLDM

About this skill

Quality Score

0/100

README

<h1 style="color: skyblue;"> <b>RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation</b> </h1> <h2 style="color: thistle;"> <b>NeurIPS2025 Spotlight ★</b> </h2>

🔥🔥🔥 RepLDM is a training-free method for higher-resolution image generation, enabling the 8k image generation! You can freely adjust the richness of colors and details in the generated image through attention guidance.

<div align="center">  <a href="https://arxiv.org/abs/2410.06055"><img src="https://img.shields.io/badge/arXiv-2410.06055-B31B1B.svg" alt="arXiv Paper"></a>       <a href='https://kmittle.github.io/project_pages/RepLDM/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>       <a href='#'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>

<br>Boyuan Cao, Jiaxin Ye, Yujie Wei, and Hongming Shan* <br>(* Corresponding Author) <br>From Fudan University <br> <img src="fig/teaser.png" width="100%">

</div>

📝 TODO List

SDXL Based
- Text to Image
  - [ √ ] RepLDM
  - [ √ ] FreeScale + AttentionGuidance
- +ControlNet
  - [ √ ] RepLDM
  - [___] FreeScale + AttentionGuidance
FLUX Based
- Text to Image
  - [___] RepLDM
SD3 Based
- Text to Image
  - [___] RepLDM
[___] Web UI

⚙️ Setup

Install Environment

conda create -n repldm python=3.9
conda activate repldm
pip install -e .

🚀 Quik Start

Quick start with Gradio

TODO

Text to image generation

TODO

📖 Overview of RepLDM

RepLDM enables the rapid synthesis of high-quality, high-resolution images without the need for further training.

It consists of two stages:

Synthesizing high-quality images at the training resolution using Attention Guidance.
Generating finer high-resolution images through pixel upsampling and "diffusion-denoising" loop.

Attention Guidance enables the generation of images with more vivid colors and richer details, as shown in the figure below.

Attention Guidance can be used in conjunction with plugins such as ControlNet to achieve an enhanced visual experience, as illustrated in the figure below.

Attention Guidance allows users to freely adjust the level of detail and color richness in an image according to their preferences, simply by modifying the `attention guidance scale`, as shown in the figure below.

How does attention guidance work?

Attention Guidance computes layout-enhanced representations using a training-free self-attention (TFSA) mechanism and leverages them to strengthen layout consistency:

$\tilde{\boldsymbol{z}} = \gamma\mathrm{TFSA}(\boldsymbol{z})+(1-\gamma) \boldsymbol{z}, \quad \mathrm{TFSA}(\boldsymbol{z}) = \mathrm{f}^{-1}\left(\mathrm{Softmax}\left(\frac{\mathrm{f}(\boldsymbol{z}) \mathrm{f}(\boldsymbol{z})^{\mathrm{T}}}{\lambda}\right) \mathrm{f}(\boldsymbol{z})\right),$

where $\boldsymbol{z}$ is the latent representation, $\mathrm{f}$ denotes reshape operation, and 𝛾 and 𝜆 are hyperparameters. Specifically, Attention Guidance leads each denoising step closer to the final state, as illustrated in the figure below.

🔬 On Research Comparison

The implementation in the main branch includes some modifications based on the original version. If you want to compare with the original method reported in the paper, please refer to the code in the base branch.

😉 Citation

@inproceedings{caorepldm,
  title={RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation},
  author={Cao, Boyuan and Ye, Jiaxin and Wei, Yujie and Shan, Hongming},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}

Related Skills

docs-writer

99.2k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

336.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

zola-ai

An autonomous Solana wallet agent that executes payments via Twitter mentions and an in-app dashboard, powered by Claude.

kmittle

View profile

View on GitHub

GitHub Stars247

CategoryContent

Updated2h ago

Forks38

kmittle/RepLDM

Languages

Python

Security Score

95/100

Audited on Mar 26, 2026

No findings