SkillAgentSearch skills...

OMG

[ECCV 2024] OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

Install / Use

/learn @kongzhecn/OMG
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

<div align="center"> <h1>OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models (ECCV 2024)</h1>

Zhe Kong · Yong Zhang* · Tianyu Yang · Tao Wang· Kaihao Zhang

Bizhu Wu · Guanying Chen · Wei Liu · Wenhan Luo*

<sup>*</sup>Corresponding Authors

OMG + LORA : <a href='https://huggingface.co/spaces/Fucius/OMG'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>

OMG + InstantID: <a href='https://huggingface.co/spaces/Fucius/OMG-InstantID'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>

<a href='https://kongzhecn.github.io/omg-project/'><img src='https://img.shields.io/badge/Project-Page-green'></a> <a href='https://arxiv.org/abs/2403.10983'><img src='https://img.shields.io/badge/Technique-Report-red'></a> GitHub

</div>

TL; DR: OMG is a framework for multi-concept image generation, supporting character and style LoRAs on Civitai.com. It also can be combined with InstantID for multiple IDs with using a single image for each ID.

<p align="center"> <img src="assets/teaser.png"> </p>

Introduction of OMG: A tool for high-quality multi-character image generation.

IMAGE ALT TEXT

Trailor Demo: A short trailor "Home Defense" created by using OMG + SVD.

IMAGE ALT TEXT

:label: Change Log

  • [2023/3/22] 🔥 We release the Hugging space for OMG + InstantID. Support ID personalization with a single image.
  • [2023/3/19] 🔥 We release the technical report and Hugging space for OMG-LoRAs
  • [2023/3/18] 🔥 We release the source code and gradio demo of OMG.

🔆 Introduction

1. OMG + LoRA (ID with multiple images)

<p align="center"> <img src="assets/lora.png" height=390> </p>

2. OMG + InstantID (ID with single image)

<p align="center"> <img src="assets/instantid.png" height=390> </p>

3. OMG + ControlNet (Layout Control )

<p align="center"> <img src="assets/controlnet.png" height=1024> </p>

4. OMG + style LoRAs (Style Control)

<p align="center"> <img src="assets/style.png" height=390> </p>

:wrench: Dependencies and Installation

  1. The code requires python==3.10.6, as well as pytorch==2.0.1 and torchvision==0.15.2. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
conda create -n OMG python=3.10.6
conda activate OMG
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/segment-anything.git
  1. For Visual comprehension, you can choose YoloWorld + EfficientViT SAM or GroundingDINO + SAM
    1. (Recommend) YoloWorld + EfficientViT SAM:

pip install inference[yolo-world]==0.9.13
pip install  onnxsim==0.4.35

    1. (Optional) If you can not install inference[yolo-world]. You can install GroundingDINO for visual comprehension.

GroundingDINO requires manual installation.

Run this so the environment variable will be set under current shell.


export CUDA_HOME=/path/to/cuda-11.3

In this example, /path/to/cuda-11.3 should be replaced with the path where your CUDA toolkit is installed.


git clone https://github.com/IDEA-Research/GroundingDINO.git

cd GroundingDINO/

pip install -e .

More installation details can be found in GroundingDINO

⏬ Pretrained Model Preparation

1) DownLoad Models

1. Required download:

Download stable-diffusion-xl-base-1.0, controlnet-openpose-sdxl-1.0.

For InstantID + OMG download: wangqixun/YamerMIX_v8, InstantID, antelopev2,

2. For Visual comprehension, you can choose "YoloWorld + EfficientViT SAM" or "GroundingDINO + SAM".

For YoloWorld + EfficientViT SAM: EfficientViT-SAM-XL1, yolo-world.

For GroundingDINO + SAM: GroundingDINO, SAM.

3. For Character LoRAs, download at least one character for man and another character for woman.

For Character LoRAs for man: Chris Evans, Gleb Savchenko, Harry Potter, Jordan Torres.

For Character LoRAs for woman: Taylor Swift, Jennifer Lawrence, Hermione Granger, Keira Knightley.

4. (Optional) If using ControlNet, download:

ControlNet, controlnet-canny-sdxl-1.0, controlnet-depth-sdxl-1.0, dpt-hybrid-midas.

5. (Optional) If using Style LoRAs, download:

Anime Sketch Style, Oil Painting Style, Cinematic Photography Style.

2) Preparation

Put the models under checkpoint as follow:

OMG
├── assets
├── checkpoint
│   ├── antelopev2
│   │   └── models
│   │       └── antelopev2
│   │           ├── 1k3d68.onnx
│   │           ├── 2d106det.onnx
│   │           ├── genderage.onnx
│   │           ├── glintr100.onnx
│   │           └── scrfd_10g_bnkps.onnx
│   ├── ControlNet
│   ├── controlnet-canny-sdxl-1.0
│   ├── controlnet-depth-sdxl-1.0
│   ├── controlnet-openpose-sdxl-1.0
│   ├── dpt-hybrid-midas
│   ├── GroundingDINO
│   ├── InstantID
│   ├── lora
│   │   ├── chris-evans.safetensors 
│   │   ├── Gleb-Savchenko_Liam-Hemsworth.safetensors 
│   │   ├── Harry_Potter.safetensors 
│   │   ├── Hermione_Granger.safetensors 
│   │   ├── jordan_torres_v2_xl.safetensors 
│   │   ├── keira_lora_sdxl_v1-000008.safetensors 
│   │   ├── lawrence_dh128_v1-step00012000.safetensors 
│   │   └── TaylorSwiftSDXL.safetensors 
│   ├── sam
│   │   ├── sam_vit_h_4b8939.pth
│   │   └── xl1.pt 
│   ├── stable-diffusion-xl-base-1.0 
│   ├── style
│   │   ├── Anime_Sketch_SDXL.safetensors 
│   │   ├── Cinematic Hollywood Film.safetensors
│   │   └── EldritchPaletteKnife.safetensors 
│   └── YamerMIX_v8 
├── example
├── gradio_demo
├── inference_instantid.py
├── inference_lora.py
├── README.md
├── requirements.txt
└── src

Put ViT-B-32.pt (download from openai) to ~/.cache/clip/ViT-B-32.pt. If using YoloWorld, put yolo-world.pt to /tmp/cache/yolo_world/l/yolo-world.pt.

Or you can manually set the checkpoint path as follows:

python inference_lora.py  \
--pretrained_sdxl_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_checkpoint <path to controlnet-openpose-sdxl-1.0> \
--efficientViT_checkpoint <path to efficientViT-SAM-XL1> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--lora_path <Lora path to character1|Lora path to character1> \
--style_lora <Path to style LoRA>

For OMG + InstantID:

python inference_instantid.py  \
--pretrained_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_path <path to InstantID controlnet> \
--face_adapter_path <path to InstantID face adapter> \
--efficientViT_checkpoint <path to efficientViT-SAM-XL1> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--antelopev2_path <path to antelopev2> \
--style_lora <Path to style LoRA>

:computer: Usage

1: OMG + LoRA

The <TOK> for Harry_Potter.safetensors is Harry Potter and for Hermione_Granger.safetensors is Hermione Granger.

For visual comprehension, you can set --segment_type 'yoloworld' for YoloWorld + EfficientViT SAM, or --segment_type 'GroundingDINO' for GroundingDINO + SAM.

python inference_lora.py \
    --prompt <prompt for the two person> \
    --negativ

Related Skills

View on GitHub
GitHub Stars701
CategoryContent
Updated10d ago
Forks46

Languages

Python

Security Score

85/100

Audited on Mar 18, 2026

No findings