LoRWeB
We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.
Install / Use
/learn @NVlabs/LoRWeBREADME
LoRWeB: Spanning the Visual Analogy Space with a Weight Basis of LoRAs
<div align="center"> </div>👥 Authors
<div align="center">Hila Manor<sup>1,2</sup>, Rinon Gal<sup>2</sup>, Haggai Maron<sup>1,2</sup>, Tomer Michaeli<sup>1</sup>, Gal Chechik<sup>2,3</sup>
<sup>1</sup>Technion - Israel Institute of Technology <sup>2</sup>NVIDIA <sup>3</sup>Bar-Ilan University
<br> <img src="assets/teaser.jpg" alt="LoRWeB Teaser" width="800"/>Given a prompt and an image triplet ${a,a',b}$ that visually describe a desired transformation, LoRWeB dynamically constructs a single LoRA from a learnable basis of LoRA modules, and produces an editing result $b'$ that applies the same analogy to the new image.
</div>📄 Abstract
Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet ${a,a',b}$, the goal is to generate $b'$ such that $a$ : $a'$ :: $b$ : $b'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation.
📋 Table of Contents
🔨 Setup
conda env create -f environment.yml
conda activate lorweb
🚀 Usage
💻 Training
Train a LoRWeB model on your visual analogy dataset:
python run.py config/your_config.yaml
You can override the main options with arguments to the run.py script, e.g. python run.py LoRWeB_default_PROMPTS.yaml --name "lorweb_model" --linear 4 --linear_alpha 4 --loras_num 32 --lora_softmax true --query_mode "cat-aa'b"
📊 Training Data Format
We trained on Relation252k.
The training script expects 2 folder: control - which will contain images of the ${a,a',b}$ triplets, and target which contains images of the corresponding $b$ image.
Use preprocess_data.py to preprocess a pre-downloaded dataset.
🎨 Inference
You can test our model's checkpoint from HuggingFace (coming soon) using inference.py.
python inference.py -w "output/your_model/your_model.safetensors" -c "output/your_model/config.yaml" -a "data/path_to_a_img.jpg" -t "data/path_to_atag_img.jpg" -b "data/patH-to_b_img.jpg" -o "outputs/generated_btag_img_path.jpg"
ℹ️ Additional Information
Our complementary custom evaluation set is available on HuggingFace (coming soon).
📚 Citation
If you use this code in your research, please cite:
@article{manor2026lorweb,
title={Spanning the Visual Analogy Space with a Weight Basis of LoRAs},
author={Manor, Hila and Gal, Rinon and Maron, Haggai and Michaeli, Tomer and Chechik, Gal},
journal={arXiv preprint arXiv:2602.15727},
year={2026}
}
🙏 Acknowledgements
This project builds upon:
- FLUX.1-Kontext by Black Forest Labs
- Diffusers by Hugging Face
- PEFT by Hugging Face
- AI-Toolkit for training infrastructure
<div align="center">
⭐ Star this repo if you find it useful! ⭐
</div>