Ctrlora
[ICLR 2025] Codebase for "CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation"
Install / Use
/learn @xyfJASON/CtrloraREADME
To create your customized ControlNet in an easy and low-cost manner 🎉
<p align="center"> <img src="./assets/banner.jpg" alt="banner" style="width: 100%" /> </p> <p align="center"> <img src="./assets/style-transfer.gif" alt="style-transfer" style="width: 100%" /> </p>The images are compressed for loading speed.
<h1 align="center"> <a href="https://arxiv.org/abs/2410.09400">CtrLoRA</a> </h1><a href="https://arxiv.org/abs/2410.09400"><img src="https://img.shields.io/badge/ICLR 2025-3A98B9?label=%F0%9F%93%9D&labelColor=FFFDD0" style="height: 28px" /></a> <a href="https://huggingface.co/xyfJASON/ctrlora/tree/main"><img src="https://img.shields.io/badge/Model-3A98B9?label=%F0%9F%A4%97&labelColor=FFFDD0" style="height: 28px" /></a>
<p align="center"> <img src="./assets/overview.jpg" alt="base-conditions" style="width: 100%" /> </p>CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
Yifeng Xu<sup>1,2</sup>, Zhenliang He<sup>1</sup>, Shiguang Shan<sup>1,2</sup>, Xilin Chen<sup>1,2</sup>
<sup>1</sup>Key Lab of AI Safety, Institute of Computing Technology, CAS, China
<sup>2</sup>University of Chinese Academy of Sciences, China
We first train a Base ControlNet along with condition-specific LoRAs on base conditions with a large-scale dataset. Then, our Base ControlNet can be efficiently adapted to novel conditions by new LoRAs with <mark>10% parameters, as few as 1,000 images, and less than 1 hour training on a single GPU</mark>.
📜 Contents
- 🎨 Visual Results
- 🛠️ Installation
- 🤖️ Download Pretrained Models
- 🚀 Gradio Demo
- 🚗 Python API
- 🚢 ComfyUI Workflows
- 🔥 Train a LoRA for Your Custom Condition
- 📚 Detailed Instructions
🎨 Visual Results
🎨 Controllable generation on "base conditions"
|<img src="./assets/base-conditions.jpg" alt="base-conditions" style="width: 100%" />| |-|
🎨 Controllable generation on "novel conditions"
|<img src="./assets/novel-conditions.jpg" alt="novel-conditions" style="width: 100%" />| |-|
🎨 Integration into community models & Multi-conditional generation
|<img src="./assets/community-multi.jpg" alt="integration" style="width: 100%" />| |-|
🎨 Application to style transfer
|<img src="./assets/style-transfer.jpg" alt="style-transfer" style="width: 100%" />| |-|
🎨 Spatial + style control (integrated with InstantStyle)
| <img src="./assets/instant-style.jpg" alt="style-transfer" style="width: 100%" /> | |-|
🛠️ Installation
Clone this repo:
git clone --depth 1 https://github.com/xyfJASON/ctrlora.git
cd ctrlora
Create and activate a new conda environment:
conda create -n ctrlora python=3.10
conda activate ctrlora
Install pytorch and other dependencies:
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
🤖️ Download Pretrained Models
We provide our pretrained models here. Please put the Base ControlNet (ctrlora_sd15_basecn700k.ckpt) into ./ckpts/ctrlora-basecn and the LoRAs into ./ckpts/ctrlora-loras.
The naming convention of the LoRAs is ctrlora_sd15_<basecn>_<condition>.ckpt for base conditions and ctrlora_sd15_<basecn>_<condition>_<images>_<steps>.ckpt for novel conditions.
You also need to download the SD1.5-based Models and put them into ./ckpts/sd15. Models used in our work:
- Stable Diffusion v1.5 (
v1-5-pruned.ckpt): official / mirror - Realistic Vision
- Dreamshaper
- Mistoon Anime
- Comic Babes
- Oil Painting
- Inkpunk
- Chinese Ink Comic-strip
- Slate Pencil Mix
- Aziib Pixel Mix
🚀 Gradio Demo
🚀 Spatial Control
python app/gradio_ctrlora.py
Requires at least 9GB/21GB GPU RAM to generate a batch of one/four 512x512 images.
<details><summary><strong>Single-conditional generation</strong></summary>- select the Stable Diffusion checkpoint, Base Controlnet checkpoint and LoRA checkpoint.
- write prompts and negative prompts. We provide several commonly used prompts.
- prepare a condition image
- upload an image to the left of the "Condition" panel, select the preprocessor corresponding to the LoRA, and click "Detect".
- or upload the condition image directly, select the "none" preprocessor, and click "Detect".
- click "Run" to generate images.
- if you upload any new checkpoints, restart gradio or click "Refresh".
- select a stylized Stable Diffusion checkpoint to specify the target style, e.g., Pixel.
- select the Base ControlNet checkpoint.
- select palette for the LoRA1 checkpoint and lineart for the LoRA2 checkpoint.
- palette + canny or palette + hed also work, maybe there are more interesting combinations to be discovered
- write prompts and negative prompts.
- upload the source image to the "Condition 1" panel, select the "none" preprocessor, and click "Detect".
- upload the source image to the "Condition 2" panel, select the "lineart" preprocessor, and click "Detect".
- adjust the weights for the two conditions in the "Basic options" panel.
- click "Run" to generate images.
🚀 Spatial + Style Control
Many thanks to Lianchen-li for integrating InstantStyle with CtrLoRA to support "spatial + style" control!
To run the gradio demo, first download IP Adapter from this repo. You need to download the models directory, rename it to ip-adapter, and put it into ./ckpts/.
Then launch the gradio demo:
python app/gradio_ctrlora_style_transfer.py
<details><summary><strong>Instructions</strong></summary>
- select the Stable Diffusion, Base Controlnet, LoRA, and IP Adapter checkpoint.
- write prompts and negative prompts.
- prepare a condition image
- upload an image to the "Content" panel in the "Reference images" block, select the preprocessor corresponding to the LoRA, and click "Detect".
- or upload the condition image directly, select the "none" preprocessor, and click "Detect".
- upload a style image to the "Style" panel in the "Reference images" block.
- optionally, you can choose style mode and use negative content prompt.
- click "Run" to generate images.
🚗 Python API
Besides the Gradio demo, you can also sample images with the following Python code.
🚗 Single-conditional generation
from api import CtrLoRA
ctrlora = CtrLoRA(num_loras=1)
ctrlora.create_model(
sd_file='ckpts/sd15/v1-5-pruned.ckpt',
basecn_file='ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt',
lora_files='ckpts/ctrlora-loras/novel-conditions/ctrlora_sd15_basecn700k_inpainting_brush_rank128_1kimgs_1ksteps.ckpt',
)
samples = ctrlora.sample(
cond_image_paths='assets/test_images/inpaint_cat.png',
prompt='A cat wearing a brown cowboy hat, best quality',
n_prompt='worst quality',
num_samples=1,
)
samples[0].show()
🚗 Multi-conditional generation
from api import CtrLoRA
ctrlora = CtrLoRA(num_loras=2)
ctrlora.create_model(
sd_file='ckpts/sd15/v1-5-pruned.ckpt',
basecn_file='ckpts/ctrlora-basecn/ctrlora_sd15_basecn700k.ckpt',
lora_files=('ckpts/ctrlora-loras/novel-conditions/ctrlora_sd15_basecn700k_lineart_rank128_1kimgs_1ksteps.ckpt',
'ckpts/ctrlora-loras/novel-conditions/ctrlora_sd15_basecn700k_palette_rank128_100kimgs_100ksteps.ckpt'),
)
samples = ctrlora.sample(
cond_image_paths=('assets/test_images/lineart_bird.png',
'assets/test_images/palette_bird.png'),
prompt='Photo of a parrot, best quality',
n_prompt='worst quality',
num_samples=1,
lora_weights=(1.0, 1.0),
)
samples[0].show()
🚢 ComfyUI Workflows
Many thanks to Kosinkadink for his hard work to create the CtrLoRA node! Many thanks to toyxyz for sharing his workflow using CtrLoRA with AnimateDiff!
| **[CtrLoRA-Canny](https://raw.githubusercontent.com/xyfJASON/ctrlora/refs/heads/main/assets/workflow-c
Related Skills
qqbot-channel
351.4kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
351.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
