ViCo

License

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

teaser

⏳ To Do

[x] Release inference code
[x] Release pretrained models
[x] Release training code
[ ] Quantitative evaluation code
[ ] Hugging Face demo

⚙️ Set-up

Create a conda environment vico using

conda env create -f environment.yaml
conda activate vico

⏬ Download

Download the pretrained stable diffusion v1-4 under models/ldm/stable-diffusion-v1.

We provide the pretrained checkpoints at 300, 350, and 400 steps of 8 objects. You can download the sample images and their corresponding pretrained checkpoints. You can also download the data of any object:

Datasets are originally collected and provided by Textual Inversion, DreamBooth, and Custom Diffsuion. You can find all datasets used for quantitaive comparison in our paper.

🚀 Inference

Before running the inference command, please set:

REF_IMAGE_PATH: Path of the reference image. It can be any image in the samples like batman/1.jpg.
CHECKPOINT_PATH: Path of the checkpoint weight. Its subfolder should be similar to checkpoints/*-399.pt.
OUTPUT_PATH: Path of the generated images. For example, it can be like outputs/batman.

python scripts/vico_txt2img.py \
--ddim_eta 0.0  --n_samples 4  --n_iter 2  --scale 7.5  --ddim_steps 50  \
--ckpt_path models/ldm/stable-diffusion-v1/sd-v1-4.ckpt  \
--image_path REF_IMAGE_PATH \
--ft_path CHECKPOINT_PATH \
--load_step 399 \
--prompt "a photo of * on the beach" \
--outdir OUTPUT_PATH

You can specify load_step (300,350,400) and personalize prompt (a prefix "a photo of" usually makes better results).

💻 Training

Before running the training command, please set:

RUN_NAME: Your run name. Will be the name of the folder of logs.
GPUS_USED: GPUs you are using, e.g., "0,1,2,3". (4 RTX 3090 GPUs in my case)
TRAIN_DATA_ROOT: Path of your training images.
INIT_WORD: Initialize the word to represent your unique object, e.g., "dog" and "toy".

python main.py \
--base configs/stable-diffusion/v1-finetune.yaml -t  \
--actual_resume models/ldm/stable-diffusion-v1/sd-v1-4.ckpt  \
-n RUN_NAME \
--gpus  GPUS_USED \
--data_root TRAIN_DATA_ROOT \
--init_word INIT_WORD

📖 Citation

If you use this code in your research, please consider citing our paper:

@inproceedings{Hao2023ViCo,
  title={ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation},
  author={Shaozhe Hao and Kai Han and Shihao Zhao and Kwan-Yee K. Wong},
  year={2023}
}

💐 Acknowledgements

This code repository is based on the great work of Textual Inversion. Thanks!

ViCo

Install / Use

README

ViCo

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

⏳ To Do

⚙️ Set-up

⏬ Download

🚀 Inference

💻 Training

📖 Citation

💐 Acknowledgements