LoRWeB: Spanning the Visual Analogy Space with a Weight Basis of LoRAs

</div>

👥 Authors

Hila Manor<sup>1,2</sup>, Rinon Gal<sup>2</sup>, Haggai Maron<sup>1,2</sup>, Tomer Michaeli<sup>1</sup>, Gal Chechik<sup>2,3</sup>

<sup>1</sup>Technion - Israel Institute of Technology <sup>2</sup>NVIDIA <sup>3</sup>Bar-Ilan University

Given a prompt and an image triplet ${a,a',b}$ that visually describe a desired transformation, LoRWeB dynamically constructs a single LoRA from a learnable basis of LoRA modules, and produces an editing result $b'$ that applies the same analogy to the new image.

</div>

📄 Abstract

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet ${a,a',b}$, the goal is to generate $b'$ such that $a$ : $a'$ :: $b$ : $b'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation.

🔨 Setup

conda env create -f environment.yml
conda activate lorweb

🚀 Usage

💻 Training

Train a LoRWeB model on your visual analogy dataset:

python run.py config/your_config.yaml

You can override the main options with arguments to the run.py script, e.g. python run.py LoRWeB_default_PROMPTS.yaml --name "lorweb_model" --linear 4 --linear_alpha 4 --loras_num 32 --lora_softmax true --query_mode "cat-aa'b"

📊 Training Data Format

We trained on Relation252k. The training script expects 2 folder: control - which will contain images of the ${a,a',b}$ triplets, and target which contains images of the corresponding $b$ image. Use preprocess_data.py to preprocess a pre-downloaded dataset.

🎨 Inference

You can test our model's checkpoint from HuggingFace (coming soon) using inference.py.

python inference.py -w "output/your_model/your_model.safetensors" -c "output/your_model/config.yaml" -a "data/path_to_a_img.jpg" -t "data/path_to_atag_img.jpg" -b "data/patH-to_b_img.jpg" -o "outputs/generated_btag_img_path.jpg"

ℹ️ Additional Information

Our complementary custom evaluation set is available on HuggingFace (coming soon).

📚 Citation

If you use this code in your research, please cite:

@article{manor2026lorweb,
    title={Spanning the Visual Analogy Space with a Weight Basis of LoRAs},
    author={Manor, Hila and Gal, Rinon and Maron, Haggai and Michaeli, Tomer and Chechik, Gal},
    journal={arXiv preprint arXiv:2602.15727},
    year={2026}
}

🙏 Acknowledgements

This project builds upon:

FLUX.1-Kontext by Black Forest Labs
Diffusers by Hugging Face
PEFT by Hugging Face
AI-Toolkit for training infrastructure

⭐ Star this repo if you find it useful! ⭐

</div>

LoRWeB

Install / Use

README

LoRWeB: Spanning the Visual Analogy Space with a Weight Basis of LoRAs

👥 Authors

📄 Abstract

📋 Table of Contents

🔨 Setup

🚀 Usage

💻 Training

📊 Training Data Format

🎨 Inference

ℹ️ Additional Information

📚 Citation

🙏 Acknowledgements