<div id="top"></div>      <br />

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

teaser

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes like material, style, layout, etc. remains a challenge, leading to a lack of disentanglement and editability. To address this, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low- to high-frequency information, providing a new perspective on representing, generating, and editing images. We develop Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer stronger disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image/text-guided material/style/layout transfer/editing, achieving previously unattainable results with a single image input without fine-tuning the diffusion models.

For details see the paper

Getting Started

Prerequisites

For packages, see environment.yaml.

conda env create -f environment.yaml
conda activate ldm

Installation

Clone the repo

git clone https://github.com/zyxElsa/ProSpect.git

Train

Train ProSpect:

python main.py --base configs/stable-diffusion/v1-finetune.yaml
            -t 
            --actual_resume ./models/sd/sd-v1-4.ckpt
            -n <run_name> 
            --gpus 0, 
            --data_root /path/to/directory/with/images

See configs/stable-diffusion/v1-finetune.yaml for more options

Download the pretrained Stable Diffusion Model and save it at ./models/sd/sd-v1-4.ckpt.

Test

To generate new images, run ProSpect.ipynb

Instructions

main(prompt = '*', \
  ddim_steps = 50, \
  strength = 0.6, \
  seed=42, \
  height = 512, \
  width = 768, \
  prospect_words = ['a teddy * walking in times square', # 10 generation ends\
                       'a teddy * walking in times square', # 9 \
                       'a teddy * walking in times square', # 8 \
                       'a teddy * walking in times square', # 7 \
                       'a teddy * walking in times square', # 6 \
                       'a teddy * walking in times square', # 5 \
                       'a teddy * walking in times square', # 4 \
                       'a teddy * walking in times square', # 3 \
                       'a teddy walking in times square', # 2 \
                       'a teddy walking in times square', # 1 generation starts\
                      ], \
  model = model,\
  )

prompt: text promt that injected into all stages. A '*' in the prompt will be replaced by prospect_words, if the prospect_words is not None. Otherwise, '*' will be replaced by the learned token embedding.

Edit prospect_words to change the prompts injected into different stages. A '*' in the prospect_words will be replaced by the learned token embedding.

For img2img, a content_dir to the image, and a strength for diffusion are needed.

A more detailed example:

Reference Image:

reference

Content-aware T2I generation

main(prompt = '*', \
      ddim_steps = 50, \
      strength = 0.6, \
      seed=42, \
      height = 512, \
      width = 768, \
      prospect_words = ['a teddy * walking in times square', # 10 generation ends\
                           'a teddy * walking in times square', # 9 \
                           'a teddy * walking in times square', # 8 \
                           'a teddy * walking in times square', # 7 \
                           'a teddy * walking in times square', # 6 \
                           'a teddy * walking in times square', # 5 \
                           'a teddy * walking in times square', # 4 \
                           'a teddy * walking in times square', # 3 \
                           'a teddy walking in times square', # 2 \
                           'a teddy walking in times square', # 1 generation starts\
                          ], \
      model = model,\
      )

with ProSpect:

result_content

Layout-aware T2I generation

main(prompt = '*', \
      ddim_steps = 50, \
      strength = 0.6, \
      seed=41, \
      height = 512, \
      width = 512, \
      prospect_words = ['a corgi sits on the table', # 10 generation ends\
                           'a corgi sits on the table', # 9 \
                           'a corgi sits on the table', # 8 \
                           'a corgi sits on the table', # 7 \
                           'a corgi sits on the table', # 6 \
                           'a corgi sits on the table', # 5 \
                           'a corgi sits on the table', # 4 \
                           'a corgi sits on the table', # 3 \
                           'a corgi sits on the table', # 2 \
                           'a corgi sits on the table *', # 1 generation starts\
                          ], \
      model = model,\
      )

with ProSpect:

result

without ProSpect:

result

Material-aware T2I generation

main(prompt = '*', \
      ddim_steps = 50, \
      strength = 0.6, \
      seed=42, \
      height = 512, \
      width = 768, \
      prospect_words = ['a * dog on the table', # 10 generation ends\
                           'a * dog on the table', # 9 \
                           'a

ProSpect

Install / Use

README