WithAnyone
✨ [ICLR'26] WithAnyone is capable of generating high-quality, controllable, and ID consistent images
Install / Use
/learn @Doby-Xu/WithAnyoneREADME
Star us if you find this project useful! ⭐
🎉 Updates
- [01/2026] 🔥 WithAnyoen is accepted by ICLR 2026, see you in Rio de Janeiro, Brazil 🎉🎉
- [12/2025] 🔥 Training Codebase is now released!
- [11/2025] 🔥 ComfyUI (community contribution) is now supported!
- [10/2025] 🔥 Hugging Face Space Demo is online — give it a try!
- [10/2025] 🔥 Model Checkpoints, MultiID-Bench, and MultiID-2M are released!
- [10/2025] 🔥 Codebase and Project Page are released!
❤ Community Contributions
<!-- ComfyUI support by okdalto -->A huge thanks to @okdalto for contributing the ComfyUI integration!
The ComfyUI version of WithAnyone is now available — check it out here and enjoy a seamless node-based workflow within the ComfyUI environment.
🕒 Action Items
- [x] Inference scripts
- [x] WithAnyone - FLUX.1
- [x] WithAnyone.K.preview - FLUX.1 Kontext
- [x] WithAnyone.Ke.preview - FLUX.1 Kontext
- [ ] WithAnyone - FLUX.1 Kontext
- [x] MultiID-Bench
- [x] MultiID-2M Part 1
- [ ] MultiID-2M Part 2
- [x] Training codebase
- [ ] WithAnyone.Z - Z-image
📑Introduction
Highlight of WithAnyone
- Controllable: WithAnyone aims to mitigate the "copy-paste" artifacts in face generation. Previous methods have a tendency to directly copy and paste the reference face onto the generated image, leading poor controllability of expressions, hairstyles, accessories, and even poses. They falls into a clear trade-off between similarity and copy-paste. The more similar the generated face is to the reference, the more copy-paste artifacts it has. WithAnyone is an attampt to break this trade-off.
- Multi-ID Generation: WithAnyone can generate multiple given identities in a single image. With the help of controllable face generation, all generated faces can fit harmoniously in one group photo.
⚡️ Quick Start
🏰 Model Zoo
| Model | Description | Download | |-|-|-| | WithAnyone 1.0 - FLUX.1 | Main model with FLUX.1 | HuggingFace | | WithAnyone.K.preview - FLUX.1 Kontext | For t2i generation with FLUX.1 Kontext | HuggingFace | | WithAnyone.Ke.preview - FLUX.1 Kontext | For face-editing with FLUX.1 Kontext | HuggingFace |
If you just want to try it out, please use the base model WithAnyone - FLUX.1. The other models are for the following use cases:
<details> <summary>WithAnyone.K</summary> This is a preliminary version of WithAnyone with FLUX.1 Kontext. It can be used for text-to-image generation with multiple given identities. However, stability and quality are not as good as the base model. Please use it with caution. We are working on improving it. </details> <details> <summary>WithAnyone.Ke</summary> This is a face editing version of WithAnyone with FLUX.1 Kontext, leveraging the editing capabilities of FLUX.1 Kontext. Please use it with `gradio_edit.py` instead of `gradio_app.py`. It is still a preliminary version, and we are working on improving it. </details>🔧 Requirements
Use pip install -r requirements.txt to install the necessary packages.
🔧 Model Checkpoints
You can download the necessary model checkpoints in one of the two ways:
- Directly run the inference scripts. The checkpoints will be downloaded automatically by the
hf_hub_downloadfunction in the code to your$HF_HOME(default:~/.cache/huggingface). - Use
huggingface-cli download <repo name>to download:black-forest-labs/FLUX.1-devxlabs-ai/xflux_text_encodersopenai/clip-vit-large-patch14google/siglip-base-patch16-256-i18nwithanyone/withanyone
Then run the inference scripts. You can download only the checkpoints you need to speed up setup and save disk space.
Example forblack-forest-labs/FLUX.1-dev:huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensorshuggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors
Ignore the text encoder in theblack-forest-labs/FLUX.1-devmodel repo (it is there fordiffuserscalls). All checkpoints together require about 51 GB of disk space (~40 in hub and 10 in xet).
After downloading, set the following arguments in the inference script to the local paths of the downloaded checkpoints:
--flux_path <path to flux1-dev.safetensors>
--clip_path <path to clip-vit-large-patch14>
--t5_path <path to xflux_text_encoders>
--siglip_path <path to siglip-base-patch16-256-i18n>
--ipa_path <path to withanyone>
<div style="color:#999; font-size:0.95em; margin-top:8px;">
We need to use the ArcFace model for face embedding. It will automatically be downloaded to `./models/`. However, there is an original bug. If you see an error like `assert 'detection' in self.models`, please manually move the model directory:
</div>
<pre style="color:#888; background:transparent; border:0; padding:0; margin-top:8px;">
mv models/antelopev2/ models/antelopev2_
mv models/antelopev2_/antelopev2/ models/antelopev2/
rm -rf models/antelopev2_, antelopev2.zip
</pre>
⚡️ Gradio Demo
The Gradio GUI demo is a good starting point to experiment with WithAnyone. Run it with:
python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
--clip_path <path to clip-vit-large-patch14> \
--t5_path <path to xflux_text_encoders> \
--siglip_path <path to siglip-base-patch16-256-i18n> \
--model_type "flux-dev" # or "flux-kontext" for WithAnyone.K
❗ WithAnyone requires face bounding boxes (bboxes). You should provide them to indicate where faces are. You can provide face bboxes in two ways:
- Upload an example image with desired face locations in
Mask Configuration (Option 1: Automatic). The face bboxes will be extracted automatically, and faces will be generated in the same locations. Do not worry if the given image has a different resolution or aspect ratio; the face bboxes will be resized accordingly. - Input face bboxes directly in
Mask Configuration (Option 2: Manual). The format isx1,y1,x2,y2for each face, one per line. - <span style="color: #999;">(NOT recommended) leave both options empty, and the face bboxes will be randomly chosen from a pre-defined set. </span>
⭕ WithAnyone works well with LoRA. If you have any stylized LoRA checkpoints, use --additional_lora_ckpt <path to lora checkpoint> when launching the demo. The LoRA will be merged into the diffusion model.
python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
--additional_lora_ckpt <path to lora checkpoint> \
--lora_scale 0.8 # adjust the weight as needed
⭕ In Advanced Options, there is a slider controlling whether outputs are more "similar in spirit" or "similar in form" to the reference faces.
- Move the slider to the right to preserve more details in the reference image (expression, makeup, accessories, hairstyle, etc.). Identity will also be better preserved.
- Move it to the left for more freedom and creativity. Stylization can be stronger, hair style and makeup can be changed.
