<div align="center"> <h2>WithAnyone: Towards Controllable and ID-Consistent Image Generation</h2>    <p> <a href="https://arxiv.org/abs/2510.14975"><img src="https://img.shields.io/badge/arXiv-2510.14975-b31b1b.svg" alt="arXiv"/></a> <a href="https://doby-xu.github.io/WithAnyone/"><img src="https://img.shields.io/badge/Project-Page-blue.svg" alt="Project Page"/></a> <a href="https://huggingface.co/WithAnyone/WithAnyone"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow.svg" alt="HuggingFace"/></a> <a href="https://huggingface.co/datasets/WithAnyone/MultiID-Bench"><img src="https://img.shields.io/badge/MultiID-Bench-Green.svg" alt="MultiID-Bench"/></a> <a href="https://huggingface.co/datasets/WithAnyone/MultiID-2M"><img src="https://img.shields.io/badge/MultiID_2M-Dataset-Green.svg" alt="MultiID-2M"/></a> <a href="https://huggingface.co/spaces/WithAnyone/WithAnyone_demo"><img src="https://img.shields.io/badge/Huggingface-Demo-blue.svg" alt="MultiID-2M"/></a> </p> </div>  <p align="center"> <a href="assets/withanyone.gif"> <img src="assets/withanyone.gif" alt="Teaser" width="800"/> </a> </p>

Star us if you find this project useful! ⭐

🎉 Updates

[01/2026] 🔥 WithAnyoen is accepted by ICLR 2026, see you in Rio de Janeiro, Brazil 🎉🎉
[12/2025] 🔥 Training Codebase is now released!
[11/2025] 🔥 ComfyUI (community contribution) is now supported!
[10/2025] 🔥 Hugging Face Space Demo is online — give it a try!
[10/2025] 🔥 Model Checkpoints, MultiID-Bench, and MultiID-2M are released!
[10/2025] 🔥 Codebase and Project Page are released!

❤ Community Contributions

A huge thanks to @okdalto for contributing the ComfyUI integration!
The ComfyUI version of WithAnyone is now available — check it out here and enjoy a seamless node-based workflow within the ComfyUI environment.

🕒 Action Items

[x] Inference scripts
[x] WithAnyone - FLUX.1
[x] WithAnyone.K.preview - FLUX.1 Kontext
[x] WithAnyone.Ke.preview - FLUX.1 Kontext
[ ] WithAnyone - FLUX.1 Kontext
[x] MultiID-Bench
[x] MultiID-2M Part 1
[ ] MultiID-2M Part 2
[x] Training codebase
[ ] WithAnyone.Z - Z-image

📑Introduction

Highlight of WithAnyone

Controllable: WithAnyone aims to mitigate the "copy-paste" artifacts in face generation. Previous methods have a tendency to directly copy and paste the reference face onto the generated image, leading poor controllability of expressions, hairstyles, accessories, and even poses. They falls into a clear trade-off between similarity and copy-paste. The more similar the generated face is to the reference, the more copy-paste artifacts it has. WithAnyone is an attampt to break this trade-off.
Multi-ID Generation: WithAnyone can generate multiple given identities in a single image. With the help of controllable face generation, all generated faces can fit harmoniously in one group photo.

⚡️ Quick Start

🏰 Model Zoo

If you just want to try it out, please use the base model WithAnyone - FLUX.1. The other models are for the following use cases:

<details> <summary>WithAnyone.K</summary> This is a preliminary version of WithAnyone with FLUX.1 Kontext. It can be used for text-to-image generation with multiple given identities. However, stability and quality are not as good as the base model. Please use it with caution. We are working on improving it. </details> <details> <summary>WithAnyone.Ke</summary> This is a face editing version of WithAnyone with FLUX.1 Kontext, leveraging the editing capabilities of FLUX.1 Kontext. Please use it with `gradio_edit.py` instead of `gradio_app.py`. It is still a preliminary version, and we are working on improving it. </details>

🔧 Requirements

Use pip install -r requirements.txt to install the necessary packages.

🔧 Model Checkpoints

You can download the necessary model checkpoints in one of the two ways:

Directly run the inference scripts. The checkpoints will be downloaded automatically by the hf_hub_download function in the code to your $HF_HOME (default: ~/.cache/huggingface).
Use huggingface-cli download <repo name> to download:
- black-forest-labs/FLUX.1-dev
- xlabs-ai/xflux_text_encoders
- openai/clip-vit-large-patch14
- google/siglip-base-patch16-256-i18n
- withanyone/withanyone
  Then run the inference scripts. You can download only the checkpoints you need to speed up setup and save disk space.
  Example for black-forest-labs/FLUX.1-dev:
- huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors
- huggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors
  Ignore the text encoder in the black-forest-labs/FLUX.1-dev model repo (it is there for diffusers calls). All checkpoints together require about 51 GB of disk space (~40 in hub and 10 in xet).

After downloading, set the following arguments in the inference script to the local paths of the downloaded checkpoints:

--flux_path <path to flux1-dev.safetensors>
--clip_path <path to clip-vit-large-patch14>
--t5_path <path to xflux_text_encoders>
--siglip_path <path to siglip-base-patch16-256-i18n>
--ipa_path <path to withanyone>

<div style="color:#999; font-size:0.95em; margin-top:8px;"> We need to use the ArcFace model for face embedding. It will automatically be downloaded to `./models/`. However, there is an original bug. If you see an error like `assert 'detection' in self.models`, please manually move the model directory: </div> <pre style="color:#888; background:transparent; border:0; padding:0; margin-top:8px;"> mv models/antelopev2/ models/antelopev2_ mv models/antelopev2_/antelopev2/ models/antelopev2/ rm -rf models/antelopev2_, antelopev2.zip </pre>

⚡️ Gradio Demo

The Gradio GUI demo is a good starting point to experiment with WithAnyone. Run it with:

python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
    --clip_path <path to clip-vit-large-patch14> \
    --t5_path <path to xflux_text_encoders> \
    --siglip_path <path to siglip-base-patch16-256-i18n> \
    --model_type "flux-dev" # or "flux-kontext" for WithAnyone.K

❗ WithAnyone requires face bounding boxes (bboxes). You should provide them to indicate where faces are. You can provide face bboxes in two ways:

Upload an example image with desired face locations in Mask Configuration (Option 1: Automatic). The face bboxes will be extracted automatically, and faces will be generated in the same locations. Do not worry if the given image has a different resolution or aspect ratio; the face bboxes will be resized accordingly.
Input face bboxes directly in Mask Configuration (Option 2: Manual). The format is x1,y1,x2,y2 for each face, one per line.
<span style="color: #999;">(NOT recommended) leave both options empty, and the face bboxes will be randomly chosen from a pre-defined set. </span>

⭕ WithAnyone works well with LoRA. If you have any stylized LoRA checkpoints, use --additional_lora_ckpt <path to lora checkpoint> when launching the demo. The LoRA will be merged into the diffusion model.

python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
    --additional_lora_ckpt <path to lora checkpoint> \
    --lora_scale 0.8 # adjust the weight as needed

⭕ In Advanced Options, there is a slider controlling whether outputs are more "similar in spirit" or "similar in form" to the reference faces.

Move the slider to the right to preserve more details in the reference image (expression, makeup, accessories, hairstyle, etc.). Identity will also be better preserved.
Move it to the left for more freedom and creativity. Stylization can be stronger, hair style and makeup can be changed.

<details> <summary>How the slider works and some tips</summary> The slider act

WithAnyone

Install / Use

README