OMG

[ECCV 2024] OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

Generate Convert Improve

Install / Use

/learn @kongzhecn/OMG

About this skill

Quality Score

0/100

README

<div align="center"> <h1>OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models (ECCV 2024)</h1>

Zhe Kong · Yong Zhang* · Tianyu Yang · Tao Wang· Kaihao Zhang

Bizhu Wu · Guanying Chen · Wei Liu · Wenhan Luo*

<sup>*</sup>Corresponding Authors

OMG + LORA : <a href='https://huggingface.co/spaces/Fucius/OMG'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>

OMG + InstantID: <a href='https://huggingface.co/spaces/Fucius/OMG-InstantID'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>

</div>

TL; DR: OMG is a framework for multi-concept image generation, supporting character and style LoRAs on Civitai.com. It also can be combined with InstantID for multiple IDs with using a single image for each ID.

Introduction of OMG: A tool for high-quality multi-character image generation.

Trailor Demo: A short trailor "Home Defense" created by using OMG + SVD.

:label: Change Log

[2023/3/22] 🔥 We release the Hugging space for OMG + InstantID. Support ID personalization with a single image.
[2023/3/19] 🔥 We release the technical report and Hugging space for OMG-LoRAs
[2023/3/18] 🔥 We release the source code and gradio demo of OMG.

🔆 Introduction

1. OMG + LoRA (ID with multiple images)

2. OMG + InstantID (ID with single image)

3. OMG + ControlNet (Layout Control )

4. OMG + style LoRAs (Style Control)

:wrench: Dependencies and Installation

The code requires python==3.10.6, as well as pytorch==2.0.1 and torchvision==0.15.2. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

conda create -n OMG python=3.10.6
conda activate OMG
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/segment-anything.git

For Visual comprehension, you can choose YoloWorld + EfficientViT SAM or GroundingDINO + SAM

1. (Recommend) YoloWorld + EfficientViT SAM:


pip install inference[yolo-world]==0.9.13
pip install  onnxsim==0.4.35

1. (Optional) If you can not install inference[yolo-world]. You can install GroundingDINO for visual comprehension.

GroundingDINO requires manual installation.

Run this so the environment variable will be set under current shell.


export CUDA_HOME=/path/to/cuda-11.3

In this example, /path/to/cuda-11.3 should be replaced with the path where your CUDA toolkit is installed.


git clone https://github.com/IDEA-Research/GroundingDINO.git

cd GroundingDINO/

pip install -e .

More installation details can be found in GroundingDINO

⏬ Pretrained Model Preparation

1) DownLoad Models

1. Required download:

Download stable-diffusion-xl-base-1.0, controlnet-openpose-sdxl-1.0.

For InstantID + OMG download: wangqixun/YamerMIX_v8, InstantID, antelopev2,

2. For Visual comprehension, you can choose "YoloWorld + EfficientViT SAM" or "GroundingDINO + SAM".

For YoloWorld + EfficientViT SAM: EfficientViT-SAM-XL1, yolo-world.

For GroundingDINO + SAM: GroundingDINO, SAM.

3. For Character LoRAs, download at least one character for man and another character for woman.

For Character LoRAs for man: Chris Evans, Gleb Savchenko, Harry Potter, Jordan Torres.

For Character LoRAs for woman: Taylor Swift, Jennifer Lawrence, Hermione Granger, Keira Knightley.

4. (Optional) If using ControlNet, download:

ControlNet, controlnet-canny-sdxl-1.0, controlnet-depth-sdxl-1.0, dpt-hybrid-midas.

5. (Optional) If using Style LoRAs, download:

Anime Sketch Style, Oil Painting Style, Cinematic Photography Style.

2) Preparation

Put the models under checkpoint as follow:

OMG
├── assets
├── checkpoint
│   ├── antelopev2
│   │   └── models
│   │       └── antelopev2
│   │           ├── 1k3d68.onnx
│   │           ├── 2d106det.onnx
│   │           ├── genderage.onnx
│   │           ├── glintr100.onnx
│   │           └── scrfd_10g_bnkps.onnx
│   ├── ControlNet
│   ├── controlnet-canny-sdxl-1.0
│   ├── controlnet-depth-sdxl-1.0
│   ├── controlnet-openpose-sdxl-1.0
│   ├── dpt-hybrid-midas
│   ├── GroundingDINO
│   ├── InstantID
│   ├── lora
│   │   ├── chris-evans.safetensors 
│   │   ├── Gleb-Savchenko_Liam-Hemsworth.safetensors 
│   │   ├── Harry_Potter.safetensors 
│   │   ├── Hermione_Granger.safetensors 
│   │   ├── jordan_torres_v2_xl.safetensors 
│   │   ├── keira_lora_sdxl_v1-000008.safetensors 
│   │   ├── lawrence_dh128_v1-step00012000.safetensors 
│   │   └── TaylorSwiftSDXL.safetensors 
│   ├── sam
│   │   ├── sam_vit_h_4b8939.pth
│   │   └── xl1.pt 
│   ├── stable-diffusion-xl-base-1.0 
│   ├── style
│   │   ├── Anime_Sketch_SDXL.safetensors 
│   │   ├── Cinematic Hollywood Film.safetensors
│   │   └── EldritchPaletteKnife.safetensors 
│   └── YamerMIX_v8 
├── example
├── gradio_demo
├── inference_instantid.py
├── inference_lora.py
├── README.md
├── requirements.txt
└── src

Put ViT-B-32.pt (download from openai) to ~/.cache/clip/ViT-B-32.pt. If using YoloWorld, put yolo-world.pt to /tmp/cache/yolo_world/l/yolo-world.pt.

Or you can manually set the checkpoint path as follows:

python inference_lora.py  \
--pretrained_sdxl_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_checkpoint <path to controlnet-openpose-sdxl-1.0> \
--efficientViT_checkpoint <path to efficientViT-SAM-XL1> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--lora_path <Lora path to character1|Lora path to character1> \
--style_lora <Path to style LoRA>

For OMG + InstantID:

python inference_instantid.py  \
--pretrained_model <path to stable-diffusion-xl-base-1.0> \
--controlnet_path <path to InstantID controlnet> \
--face_adapter_path <path to InstantID face adapter> \
--efficientViT_checkpoint <path to efficientViT-SAM-XL1> \
--dino_checkpoint <path to GroundingDINO> \
--sam_checkpoint <path to sam> \
--antelopev2_path <path to antelopev2> \
--style_lora <Path to style LoRA>

:computer: Usage

1: OMG + LoRA

The <TOK> for Harry_Potter.safetensors is Harry Potter and for Hermione_Granger.safetensors is Hermione Granger.

For visual comprehension, you can set --segment_type 'yoloworld' for YoloWorld + EfficientViT SAM, or --segment_type 'GroundingDINO' for GroundingDINO + SAM.

python inference_lora.py \
    --prompt <prompt for the two person> \
    --negativ

Related Skills

docs-writer

99.3k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

339.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

project-overview

FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

kongzhecn

View profile

View on GitHub

GitHub Stars701

CategoryContent

Updated10d ago

Forks46

kongzhecn/OMG

Languages

Python

Security Score

85/100

Audited on Mar 18, 2026

No findings