Wlp
Loomis Painter: Reconstructing the painting process
Install / Use
/learn @Markus-Pobitzer/WlpREADME
Loomis Painter: Reconstructing the painting process
<p align="center"> <a href='https://arxiv.org/abs/2511.17344'> <img src='https://img.shields.io/badge/Arxiv-Pdf-A42C25?style=flat&logo=arXiv&logoColor=white'></a> <a href='https://markus-pobitzer.github.io/lplp'> <img src='https://img.shields.io/badge/Project-Page-green?style=flat&logo=Google%20chrome&logoColor=white'></a> <a href='https://huggingface.co/Markus-Pobitzer/wlp-lora'> <img src='https://img.shields.io/badge/Model-checkpoint-blue?logo=huggingface&logoColor=white'></a> </p>This is a research project. In this repo the code for fine tuning WAN 2.1 can be found. WAN Learns Painting (wlp).
<table> <tr> <td align="center"> <img src="assets/base.gif" width="380" alt="Generated Video" /> <br /> <sub>Generated Video</sub> </td> <td align="center"> <img src="assets/reference_image.png" width="380" alt="Input" title="Haystacks by Claude Monet. Source: Wikiart." /> <br /> <sub>Input</sub> </td> </tr> </table>Hugging Face Inference
The easiest way to get started is by using Hugging Face and following checkpoint.
<details> <summary>Code for inference with Hugging Face</summary>import torch
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from transformers import CLIPVisionModel
from huggingface_hub import hf_hub_download
from typing import List, Tuple, Union
from PIL import Image, ImageOps
def pil_resize(
image: Image.Image,
target_size: Tuple[int, int],
pad_input: bool = False,
padding_color: Union[str, int, Tuple[int, ...]] = "white",
) -> Image.Image:
"""Resizing it to the target size.
Args:
image: Input image to be processed.
target_size: Target size (width, height).
pad_input: If set resizes the image while keeping the aspect ratio and pads the unfilled part.
padding_color: The color for the padded pixels.
Returns:
The resized image
"""
if pad_input:
# Resize image, keep aspect ratio
image = ImageOps.contain(image, size=target_size)
# Pad while keeping image in center
image = ImageOps.pad(image, size=target_size, color=padding_color)
else:
image = image.resize(target_size)
return image
def undo_pil_resize(
image: Image.Image,
target_size: Tuple[int, int],
) -> Image.Image:
"""Undo the resizing and padding of the input image to the a new image with size target_size.
Args:
image: Input image to be processed.
target_size: Target size (width, height).
Returns:
The resized image
"""
tmp_img = Image.new(mode="RGB", size=target_size)
# Get the resized image size
tmp_img = ImageOps.contain(tmp_img, size=image.size)
# Undo padding by center cropping
width, height = image.size
tmp_width, tmp_height = tmp_img.size
left = int(round((width - tmp_width) / 2.0))
top = int(round((height - tmp_height) / 2.0))
right = left + tmp_width
bottom = top + tmp_height
cropped = image.crop((left, top, right, bottom))
# Undo resizing
ret = cropped.resize(target_size)
return ret
# Set to True if you have a GPU with less than 80GB VRAM --> Very slow inference!
enable_sequential_cpu_offload = True
# Download the LoRA file
lora_path = hf_hub_download(repo_id="Markus-Pobitzer/wlp-lora", filename="base.safetensors")
print(f"LoRA path: {lora_path}")
# Loads the pipeline
model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
image_encoder = CLIPVisionModel.from_pretrained(
model_id, subfolder="image_encoder", torch_dtype=torch.float32
)
# Takes more than 100 GB of disk space
pipe = WanImageToVideoPipeline.from_pretrained(
model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16
)
# Load LoRA
pipe.load_lora_weights(lora_path)
pipe.fuse_lora()
# Either offload or directly to GPU
if enable_sequential_cpu_offload:
pipe.enable_sequential_cpu_offload()
else:
pipe.to("cuda")
### INFERENCE ###
image = load_image(
"https://uploads3.wikiart.org/images/claude-monet/haystacks-at-giverny.jpg"
)
og_size = image.size
height = 480
width = 832
# Resize and pad
ref_image = pil_resize(image, target_size=(width, height), pad_input=True)
prompt = "Painting process step by step."
output = pipe(
image=ref_image,
prompt=prompt,
height=height,
width=width,
num_frames=81,
output_type="pil",
guidance_scale=1.0,
).frames[0]
# To original image size
output = [undo_pil_resize(img, og_size) for img in output][::-1]
# Save video
export_to_video(output, "output.mp4", fps=3)
</details>
Art Media Transfer
To transfer from one art media to the other use following LoRA:
lora_path = hf_hub_download(repo_id="Markus-Pobitzer/wlp-lora", filename="art_media_transfer.safetensors")
Make sure that you also change the prompt accordingly. The supported art medias are:
- acrylic
- colored pencils
- loomis
- pencil
- oil
The prompt has following format:
art_media = "..."
painting_desc = "..."
prompt = f"<{art_media}> Painting process step by step. {painting_desc}"
For acrylic, colored pencils and oil the prompt can contain color descriptions, i.e.
prompt = f"<acrylic> Painting process step by step. The image depicts a serene landscape with a small brown and green island in the center of a body of water, surrounded by green trees and a few boats. The sky is blue with scattered clouds, and there are birds flying in the background."
For the loomis and pencil art media we left the color information out during fine tuning, i.e.
prompt = f"<pencil> Painting process step by step. The image depicts a serene landscape with a small island in the center of a body of water, surrounded by trees and a few boats. There are scattered clouds, and birds flying in the background."
Note that the loomis method only works on portrait photos/paintings and otherwise seems to fall back to an other art media.
Installation
Until the Loomis Painter paper is accepted by a conference, we cannot release the full code required to reproduce the dataset and results. However, all the code necessary to fine-tune the WAN 2.1 model is available in this repository.
This project uses uv to manage dependencies, please use following guide to install it: https://docs.astral.sh/uv/getting-started/installation/
Create the Environment & Install Packages
Navigate to this project's root folder (where this README.md file is) in your terminal and run:
uv sync
Activate the Virtual Environment
source .venv/bin/activate
Dataset
The code for constructing the dataset is not included in this repository, and the dataset itself will not be publicly available. However, the dataset loader is provided and can be found at src/wlp/dataset/video_pkl_dataset.py.
If interested in the training data, feel free to reach out via E-Mail.
Fine Tuning
We used the script in scripts/train_snellius.sh to fine tune the WAN 2.1 model on a SLURM cluster. For fine tuning the base model we used 4 H100 GPUs for 24 hours, that corresponds to 14 epochs when using a dataset with 690 training videos. The model also shows good results when fine tuning for only 7 epochs.
Citation
If you use this work, please cite:
@misc{pobitzer2025loomispainter,
title={Loomis Painter: Reconstructing the Painting Process},
author={Markus Pobitzer and Chang Liu and Chenyi Zhuang and Teng Long and Bin Ren and Nicu Sebe},
year={2025},
eprint={2511.17344},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.17344},
}
Acknowledgments
We would like to thank the following projects and teams for their contributions and inspiration:
- WAN 2.1 for their Video generation model
- DiffSynth-Studio without their code training WAN would have been harder
- Huggingface
- PaintsUndo for the inspiration
- UniAnimate-DiT for some code on how to fine tune WAN 2.1
