SkillAgentSearch skills...

InstantID

InstantID: Zero-shot Identity-Preserving Generation in Seconds ๐Ÿ”ฅ

Install / Use

/learn @instantX-research/InstantID
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1>InstantID: Zero-shot Identity-Preserving Generation in Seconds</h1>

Qixun Wang<sup>12</sup> ยท Xu Bai<sup>12</sup> ยท Haofan Wang<sup>12*</sup> ยท Zekui Qin<sup>12</sup> ยท Anthony Chen<sup>123</sup>

Huaxia Li<sup>2</sup> ยท Xu Tang<sup>2</sup> ยท Yao Hu<sup>2</sup>

<sup>1</sup>InstantX Team ยท <sup>2</sup>Xiaohongshu Inc ยท <sup>3</sup>Peking University

<sup>*</sup>corresponding authors

<a href='https://instantid.github.io/'><img src='https://img.shields.io/badge/Project-Page-green'></a> <a href='https://arxiv.org/abs/2401.07519'><img src='https://img.shields.io/badge/Technique-Report-red'></a> <a href='https://huggingface.co/papers/2401.07519'><img src='https://img.shields.io/static/v1?label=Paper&message=Huggingface&color=orange'></a> GitHub

<a href='https://huggingface.co/spaces/InstantX/InstantID'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> ModelScope Open in OpenXLab

</div>

InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.

<img src='assets/applications.png'>

Release

  • [2024/07/18] ๐Ÿ”ฅ We are training InstantID for Kolors. The weight requires significant computational power, which is currently in the process of iteration. After the model training is completed, it will be open-sourced. The latest checkpoint results are referenced in Kolors Version.
  • [2024/04/03] ๐Ÿ”ฅ We release our recent work InstantStyle for style transfer, compatible with InstantID!
  • [2024/02/01] ๐Ÿ”ฅ We have supported LCM acceleration and Multi-ControlNets on our Huggingface Spaces Demo! Our depth estimator is supported by Depth-Anything.
  • [2024/01/31] ๐Ÿ”ฅ OneDiff now supports accelerated inference for InstantID, check this for details!
  • [2024/01/23] ๐Ÿ”ฅ Our pipeline has been merged into diffusers!
  • [2024/01/22] ๐Ÿ”ฅ We release the pre-trained checkpoints, inference code and gradio demo!
  • [2024/01/15] ๐Ÿ”ฅ We release the technical report.
  • [2023/12/11] ๐Ÿ”ฅ We launch the project page.

Demos

Stylized Synthesis

<p align="center"> <img src="assets/StylizedSynthesis.png"> </p>

Comparison with Previous Works

<p align="center"> <img src="assets/compare-a.png"> </p>

Comparison with existing tuning-free state-of-the-art techniques. InstantID achieves better fidelity and retain good text editability (faces and styles blend better).

<p align="center"> <img src="assets/compare-c.png"> </p>

Comparison with pre-trained character LoRAs. We don't need multiple images and still can achieve competitive results as LoRAs without any training.

<p align="center"> <img src="assets/compare-b.png"> </p>

Comparison with InsightFace Swapper (also known as ROOP or Refactor). However, in non-realistic style, our work is more flexible on the integration of face and background.

Kolors Version

We have adapted InstantID for Kolors. Leveraging Kolors' robust text generation capabilities ๐Ÿ‘๐Ÿ‘๐Ÿ‘, InstantID can be integrated with Kolors to simultaneously generate ID and text.

| demo | demo | demo | |:-----:|:-----:|:-----:| <img src="./assets/kolor/demo_1.jpg" >|<img src="./assets/kolor/demo_2.jpg" >|<img src="./assets/kolor/demo_3.jpg" >|

Download

You can directly download the model from Huggingface. You also can download the model in python script:

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/config.json", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ip-adapter.bin", local_dir="./checkpoints")

Or run the following command to download all models:

pip install -r gradio_demo/requirements.txt
python gradio_demo/download_models.py

If you cannot access to Huggingface, you can use hf-mirror to download models.

export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download InstantX/InstantID --local-dir checkpoints --local-dir-use-symlinks False

For face encoder, you need to manually download via this URL to models/antelopev2 as the default link is invalid. Once you have prepared all models, the folder tree should be like:

  .
  โ”œโ”€โ”€ models
  โ”œโ”€โ”€ checkpoints
  โ”œโ”€โ”€ ip_adapter
  โ”œโ”€โ”€ pipeline_stable_diffusion_xl_instantid.py
  โ””โ”€โ”€ README.md

Usage

If you want to reproduce results in the paper, please refer to the code in infer_full.py. If you want to compare the results with other methods, even without using depth-controlnet, it is recommended that you use this code.

If you are pursuing better results, it is recommended to follow InstantID-Rome.

The following code๐Ÿ‘‡ comes from infer.py. If you want to quickly experience InstantID, please refer to the code in infer.py.

# !pip install opencv-python transformers accelerate insightface
import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel

import cv2
import torch
import numpy as np
from PIL import Image

from insightface.app import FaceAnalysis
from pipeline_stable_diffusion_xl_instantid import StableDiffusionXLInstantIDPipeline, draw_kps

# prepare 'antelopev2' under ./models
app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

# prepare models under ./checkpoints
face_adapter = f'./checkpoints/ip-adapter.bin'
controlnet_path = f'./checkpoints/ControlNetModel'

# load IdentityNet
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)

base_model = 'wangqixun/YamerMIX_v8'  # from https://civitai.com/models/84040?modelVersionId=196039
pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
    base_model,
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.cuda()

# load adapter
pipe.load_ip_adapter_instantid(face_adapter)

Then, you can customized your own face images

# load an image
face_image = load_image("./examples/yann-lecun_resize.jpg")

# prepare face emb
face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1]  # only use the maximum face
face_emb = face_info['embedding']
face_kps = draw_kps(face_image, face_info['kps'])

# prompt
prompt = "film noir style, ink sketch|vector, male man, highly detailed, sharp focus, ultra sharpness, monochrome, high contrast, dramatic shadows, 1940s style, mysterious, cinematic"
negative_prompt = "ugly, deformed, noisy, blurry, low contrast, realism, photorealistic, vibrant, colorful"

# generate image
image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image_embeds=face_emb,
    image=face_kps,
    controlnet_conditioning_scale=0.8,
    ip_adapter_scale=0.8,
).images[0]

To save VRAM, you can enable CPU offloading

pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

Speed Up with LCM-LoRA

Our work is compatible with LCM-LoRA. First, download the model.

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="latent-consistency/lcm-lora-sdxl", filename="pytorch_lora_weights.safetensors", local_dir="./checkpoints")

To use it, you just need to load it and infer with a small num_inference_steps. Note that it is recommendated to set guidance_scale between [0, 1].

from diffusers import LCMScheduler

lcm_lora_path = "./checkpoints/pytorch_lora_weights.safetensors"

pipe.load_lora_weights(lcm_lora_path)
pipe.fuse_lora()
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

num_inference_steps = 10
guidance_scale = 0

Start a local gradio demo <a href='https://github.com/gradio-app/gradio'><img src='https://img.shields.io/github/stars/gradio-app/gradio'></a>

Run the following command:

python gradio_demo/app.py

or MultiControlNet version:

gradio_demo/app-multicontrolnet.py 

Usage Tips

  • For higher similarity, increase the weight of controlnet_conditioning_scale (IdentityNet) and ip_adapter_scale (Adapter).
  • For over-saturation, decrease the ip_adapter_scale. If not work, decrease controlnet_condition

Related Skills

View on GitHub
GitHub Stars11.9k
CategoryDevelopment
Updated3h ago
Forks880

Languages

Python

Security Score

95/100

Audited on Apr 1, 2026

No findings