SkillAgentSearch skills...

ProSpect

Official implementation of the paper "ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models"(SIGGRAPH Asia 2023)

Install / Use

/learn @zyxElsa/ProSpect
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div id="top"></div> <!-- *** Thanks for checking out the Best-README-Template. If you have a suggestion *** that would make this better, please fork the repo and create a pull request *** or simply open an issue with the tag "enhancement". *** Don't forget to give the project a star! *** Thanks again! Now go create something AMAZING! :D --> <!-- PROJECT SHIELDS --> <!-- *** I'm using markdown "reference style" links for readability. *** Reference links are enclosed in brackets [ ] instead of parentheses ( ). *** See the bottom of this document for the declaration of the reference variables *** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use. *** https://www.markdownguide.org/basic-syntax/#reference-style-links --> <!-- [![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url] [![MIT License][license-shield]][license-url] [![LinkedIn][linkedin-shield]][linkedin-url] --> <!-- PROJECT LOGO --> <br /> <!-- <div align="center"> <a href="https://github.com/othneildrew/Best-README-Template"> <img src="images/logo.png" alt="Logo" width="80" height="80"> </a> <h3 align="center">Best-README-Template</h3> <p align="center"> An awesome README template to jumpstart your projects! <br /> <a href="https://github.com/othneildrew/Best-README-Template"><strong>Explore the docs »</strong></a> <br /> <br /> <a href="https://github.com/othneildrew/Best-README-Template">View Demo</a> · <a href="https://github.com/othneildrew/Best-README-Template/issues">Report Bug</a> · <a href="https://github.com/othneildrew/Best-README-Template/issues">Request Feature</a> </p> </div> --> <!-- TABLE OF CONTENTS --> <!-- <details> <summary>Table of Contents</summary> <ol> <li> <a href="#about-the-project">CAST</a> <ul> <li><a href="#built-with">Built With</a></li> </ul> </li> <li> <a href="#getting-started">Getting Started</a> <ul> <li><a href="#prerequisites">Prerequisites</a></li> <li><a href="#installation">Installation</a></li> </ul> </li> <li><a href="#usage">Usage</a></li> <li><a href="#roadmap">Roadmap</a></li> <li><a href="#contributing">Contributing</a></li> <li><a href="#license">License</a></li> <li><a href="#contact">Contact</a></li> <li><a href="#acknowledgments">Acknowledgments</a></li> </ol> </details> --> <!-- ABOUT THE PROJECT -->

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

<!-- ![teaser](./Images/teaser.png) -->

teaser

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes like material, style, layout, etc. remains a challenge, leading to a lack of disentanglement and editability. To address this, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low- to high-frequency information, providing a new perspective on representing, generating, and editing images. We develop Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer stronger disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image/text-guided material/style/layout transfer/editing, achieving previously unattainable results with a single image input without fine-tuning the diffusion models.

For details see the paper

<p align="right">(<a href="#top">back to top</a>)</p> <!-- ### Built With --> <!-- This section should list any major frameworks/libraries used to bootstrap your project. Leave any add-ons/plugins for the acknowledgements section. Here are a few examples. * [Next.js](https://nextjs.org/) * [React.js](https://reactjs.org/) * [Vue.js](https://vuejs.org/) * [Angular](https://angular.io/) * [Svelte](https://svelte.dev/) * [Laravel](https://laravel.com) * [Bootstrap](https://getbootstrap.com) * [JQuery](https://jquery.com) <p align="right">(<a href="#top">back to top</a>)</p> --> <!-- GETTING STARTED -->

Getting Started

Prerequisites

For packages, see environment.yaml.

conda env create -f environment.yaml
conda activate ldm
<p align="right">(<a href="#top">back to top</a>)</p>

Installation

Clone the repo

git clone https://github.com/zyxElsa/ProSpect.git
<p align="right">(<a href="#top">back to top</a>)</p>

Train

Train ProSpect:

python main.py --base configs/stable-diffusion/v1-finetune.yaml
            -t 
            --actual_resume ./models/sd/sd-v1-4.ckpt
            -n <run_name> 
            --gpus 0, 
            --data_root /path/to/directory/with/images

See configs/stable-diffusion/v1-finetune.yaml for more options

Download the pretrained Stable Diffusion Model and save it at ./models/sd/sd-v1-4.ckpt.

<p align="right">(<a href="#top">back to top</a>)</p>

Test

To generate new images, run ProSpect.ipynb

Instructions

main(prompt = '*', \
  ddim_steps = 50, \
  strength = 0.6, \
  seed=42, \
  height = 512, \
  width = 768, \
  prospect_words = ['a teddy * walking in times square', # 10 generation ends\
                       'a teddy * walking in times square', # 9 \
                       'a teddy * walking in times square', # 8 \
                       'a teddy * walking in times square', # 7 \
                       'a teddy * walking in times square', # 6 \
                       'a teddy * walking in times square', # 5 \
                       'a teddy * walking in times square', # 4 \
                       'a teddy * walking in times square', # 3 \
                       'a teddy walking in times square', # 2 \
                       'a teddy walking in times square', # 1 generation starts\
                      ], \
  model = model,\
  )

prompt: text promt that injected into all stages. A '*' in the prompt will be replaced by prospect_words, if the prospect_words is not None. Otherwise, '*' will be replaced by the learned token embedding.

Edit prospect_words to change the prompts injected into different stages. A '*' in the prospect_words will be replaced by the learned token embedding.

For img2img, a content_dir to the image, and a strength for diffusion are needed.

A more detailed example:

Reference Image:

reference

Content-aware T2I generation

main(prompt = '*', \
      ddim_steps = 50, \
      strength = 0.6, \
      seed=42, \
      height = 512, \
      width = 768, \
      prospect_words = ['a teddy * walking in times square', # 10 generation ends\
                           'a teddy * walking in times square', # 9 \
                           'a teddy * walking in times square', # 8 \
                           'a teddy * walking in times square', # 7 \
                           'a teddy * walking in times square', # 6 \
                           'a teddy * walking in times square', # 5 \
                           'a teddy * walking in times square', # 4 \
                           'a teddy * walking in times square', # 3 \
                           'a teddy walking in times square', # 2 \
                           'a teddy walking in times square', # 1 generation starts\
                          ], \
      model = model,\
      )

with ProSpect:

result_content

Layout-aware T2I generation

main(prompt = '*', \
      ddim_steps = 50, \
      strength = 0.6, \
      seed=41, \
      height = 512, \
      width = 512, \
      prospect_words = ['a corgi sits on the table', # 10 generation ends\
                           'a corgi sits on the table', # 9 \
                           'a corgi sits on the table', # 8 \
                           'a corgi sits on the table', # 7 \
                           'a corgi sits on the table', # 6 \
                           'a corgi sits on the table', # 5 \
                           'a corgi sits on the table', # 4 \
                           'a corgi sits on the table', # 3 \
                           'a corgi sits on the table', # 2 \
                           'a corgi sits on the table *', # 1 generation starts\
                          ], \
      model = model,\
      )

with ProSpect:

result

without ProSpect:

result

Material-aware T2I generation

main(prompt = '*', \
      ddim_steps = 50, \
      strength = 0.6, \
      seed=42, \
      height = 512, \
      width = 768, \
      prospect_words = ['a * dog on the table', # 10 generation ends\
                           'a * dog on the table', # 9 \
                           'a 

Related Skills

View on GitHub
GitHub Stars143
CategoryProduct
Updated1mo ago
Forks13

Languages

Jupyter Notebook

Security Score

95/100

Audited on Mar 4, 2026

No findings