SkillAgentSearch skills...

ViCo

Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"

Install / Use

/learn @haoosz/ViCo
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

ViCo

arXiv License

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

teaser

⏳ To Do

  • [x] Release inference code
  • [x] Release pretrained models
  • [x] Release training code
  • [ ] Quantitative evaluation code
  • [ ] Hugging Face demo

⚙️ Set-up

Create a conda environment vico using

conda env create -f environment.yaml
conda activate vico

⏬ Download

Download the pretrained stable diffusion v1-4 under models/ldm/stable-diffusion-v1.

We provide the pretrained checkpoints at 300, 350, and 400 steps of 8 objects. You can download the sample images and their corresponding pretrained checkpoints. You can also download the data of any object:

| Object | Sample images | Checkpoints | | :---- | :----: | :----: | | barn | image | ckpt | | batman | image | ckpt | | clock | image | ckpt | | dog7 | image | ckpt | | monster toy | image | ckpt | | pink sunglasses | image | ckpt | | teddybear | image | ckpt | | wooden pot | image | ckpt |

Datasets are originally collected and provided by Textual Inversion, DreamBooth, and Custom Diffsuion. You can find all datasets used for quantitaive comparison in our paper.

🚀 Inference

Before running the inference command, please set:

  • REF_IMAGE_PATH: Path of the reference image. It can be any image in the samples like batman/1.jpg.
  • CHECKPOINT_PATH: Path of the checkpoint weight. Its subfolder should be similar to checkpoints/*-399.pt.
  • OUTPUT_PATH: Path of the generated images. For example, it can be like outputs/batman.
python scripts/vico_txt2img.py \
--ddim_eta 0.0  --n_samples 4  --n_iter 2  --scale 7.5  --ddim_steps 50  \
--ckpt_path models/ldm/stable-diffusion-v1/sd-v1-4.ckpt  \
--image_path REF_IMAGE_PATH \
--ft_path CHECKPOINT_PATH \
--load_step 399 \
--prompt "a photo of * on the beach" \
--outdir OUTPUT_PATH

You can specify load_step (300,350,400) and personalize prompt (a prefix "a photo of" usually makes better results).

💻 Training

Before running the training command, please set:

  • RUN_NAME: Your run name. Will be the name of the folder of logs.
  • GPUS_USED: GPUs you are using, e.g., "0,1,2,3". (4 RTX 3090 GPUs in my case)
  • TRAIN_DATA_ROOT: Path of your training images.
  • INIT_WORD: Initialize the word to represent your unique object, e.g., "dog" and "toy".
python main.py \
--base configs/stable-diffusion/v1-finetune.yaml -t  \
--actual_resume models/ldm/stable-diffusion-v1/sd-v1-4.ckpt  \
-n RUN_NAME \
--gpus  GPUS_USED \
--data_root TRAIN_DATA_ROOT \
--init_word INIT_WORD

📖 Citation

If you use this code in your research, please consider citing our paper:

@inproceedings{Hao2023ViCo,
  title={ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation},
  author={Shaozhe Hao and Kai Han and Shihao Zhao and Kwan-Yee K. Wong},
  year={2023}
}

💐 Acknowledgements

This code repository is based on the great work of Textual Inversion. Thanks!

View on GitHub
GitHub Stars243
CategoryContent
Updated1mo ago
Forks13

Languages

Jupyter Notebook

Security Score

100/100

Audited on Feb 25, 2026

No findings