SkillAgentSearch skills...

WithAnyone

✨ [ICLR'26] WithAnyone is capable of generating high-quality, controllable, and ID consistent images

Install / Use

/learn @Doby-Xu/WithAnyone
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h2>WithAnyone: Towards Controllable and ID-Consistent Image Generation</h2> <!-- <p> Hengyuan Xu, Wei Cheng, Peng Xing, Yixiao Fang, Shuhan Wu, Rui Wang, </p> <p> Xianfang Zeng, Daxin Jiang, Gang Yu, Xingjun Ma, Yu-Gang Jiang </p> --> <!-- <p><em>(† Project lead, ‡ Corresponding authors)</em></p> --> <!-- <p>Fudan University, StepFun</p> --> <p> <a href="https://arxiv.org/abs/2510.14975"><img src="https://img.shields.io/badge/arXiv-2510.14975-b31b1b.svg" alt="arXiv"/></a> <a href="https://doby-xu.github.io/WithAnyone/"><img src="https://img.shields.io/badge/Project-Page-blue.svg" alt="Project Page"/></a> <a href="https://huggingface.co/WithAnyone/WithAnyone"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow.svg" alt="HuggingFace"/></a> <a href="https://huggingface.co/datasets/WithAnyone/MultiID-Bench"><img src="https://img.shields.io/badge/MultiID-Bench-Green.svg" alt="MultiID-Bench"/></a> <a href="https://huggingface.co/datasets/WithAnyone/MultiID-2M"><img src="https://img.shields.io/badge/MultiID_2M-Dataset-Green.svg" alt="MultiID-2M"/></a> <a href="https://huggingface.co/spaces/WithAnyone/WithAnyone_demo"><img src="https://img.shields.io/badge/Huggingface-Demo-blue.svg" alt="MultiID-2M"/></a> </p> </div> <!-- <p align="center"> <a href="assets/teaser.pdf"> <img src="assets/teaser.png" alt="Teaser" width="800"/> </a> </p> --> <p align="center"> <a href="assets/withanyone.gif"> <img src="assets/withanyone.gif" alt="Teaser" width="800"/> </a> </p> <!-- <p align="center"><em>(† Project lead, ‡ Corresponding authors)</em></p> -->

Star us if you find this project useful! ⭐

🎉 Updates

❤ Community Contributions

<!-- ComfyUI support by okdalto -->

A huge thanks to @okdalto for contributing the ComfyUI integration!
The ComfyUI version of WithAnyone is now available — check it out here and enjoy a seamless node-based workflow within the ComfyUI environment.

🕒 Action Items

  • [x] Inference scripts
  • [x] WithAnyone - FLUX.1
  • [x] WithAnyone.K.preview - FLUX.1 Kontext
  • [x] WithAnyone.Ke.preview - FLUX.1 Kontext
  • [ ] WithAnyone - FLUX.1 Kontext
  • [x] MultiID-Bench
  • [x] MultiID-2M Part 1
  • [ ] MultiID-2M Part 2
  • [x] Training codebase
  • [ ] WithAnyone.Z - Z-image

📑Introduction

Highlight of WithAnyone

  • Controllable: WithAnyone aims to mitigate the "copy-paste" artifacts in face generation. Previous methods have a tendency to directly copy and paste the reference face onto the generated image, leading poor controllability of expressions, hairstyles, accessories, and even poses. They falls into a clear trade-off between similarity and copy-paste. The more similar the generated face is to the reference, the more copy-paste artifacts it has. WithAnyone is an attampt to break this trade-off.
  • Multi-ID Generation: WithAnyone can generate multiple given identities in a single image. With the help of controllable face generation, all generated faces can fit harmoniously in one group photo.
<div style="text-align:center; margin-top:12px;"> <img src="assets/fidelity_vs_copypaste_v200_single.png" alt="Copy-Paste" style="width:70%; max-width:900px; height:auto; display:inline-block;"> </div> <!-- <div style="display:flex; gap:10px; align-items:center;"> <img src="assets/001.webp" alt="001" style="width:35%; height:auto;"> <img src="assets/005.webp" alt="005" style="width:24%; height:auto;"> <img src="assets/009.webp" alt="009" style="width:32%; height:auto;"> </div> -->

⚡️ Quick Start

🏰 Model Zoo

| Model | Description | Download | |-|-|-| | WithAnyone 1.0 - FLUX.1 | Main model with FLUX.1 | HuggingFace | | WithAnyone.K.preview - FLUX.1 Kontext | For t2i generation with FLUX.1 Kontext | HuggingFace | | WithAnyone.Ke.preview - FLUX.1 Kontext | For face-editing with FLUX.1 Kontext | HuggingFace |

If you just want to try it out, please use the base model WithAnyone - FLUX.1. The other models are for the following use cases:

<details> <summary>WithAnyone.K</summary> This is a preliminary version of WithAnyone with FLUX.1 Kontext. It can be used for text-to-image generation with multiple given identities. However, stability and quality are not as good as the base model. Please use it with caution. We are working on improving it. </details> <details> <summary>WithAnyone.Ke</summary> This is a face editing version of WithAnyone with FLUX.1 Kontext, leveraging the editing capabilities of FLUX.1 Kontext. Please use it with `gradio_edit.py` instead of `gradio_app.py`. It is still a preliminary version, and we are working on improving it. </details>

🔧 Requirements

Use pip install -r requirements.txt to install the necessary packages.

🔧 Model Checkpoints

You can download the necessary model checkpoints in one of the two ways:

  1. Directly run the inference scripts. The checkpoints will be downloaded automatically by the hf_hub_download function in the code to your $HF_HOME (default: ~/.cache/huggingface).
  2. Use huggingface-cli download <repo name> to download:
    • black-forest-labs/FLUX.1-dev
    • xlabs-ai/xflux_text_encoders
    • openai/clip-vit-large-patch14
    • google/siglip-base-patch16-256-i18n
    • withanyone/withanyone
      Then run the inference scripts. You can download only the checkpoints you need to speed up setup and save disk space.
      Example for black-forest-labs/FLUX.1-dev:
    • huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors
    • huggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors
      Ignore the text encoder in the black-forest-labs/FLUX.1-dev model repo (it is there for diffusers calls). All checkpoints together require about 51 GB of disk space (~40 in hub and 10 in xet).

After downloading, set the following arguments in the inference script to the local paths of the downloaded checkpoints:

--flux_path <path to flux1-dev.safetensors>
--clip_path <path to clip-vit-large-patch14>
--t5_path <path to xflux_text_encoders>
--siglip_path <path to siglip-base-patch16-256-i18n>
--ipa_path <path to withanyone>
<div style="color:#999; font-size:0.95em; margin-top:8px;"> We need to use the ArcFace model for face embedding. It will automatically be downloaded to `./models/`. However, there is an original bug. If you see an error like `assert 'detection' in self.models`, please manually move the model directory: </div> <pre style="color:#888; background:transparent; border:0; padding:0; margin-top:8px;"> mv models/antelopev2/ models/antelopev2_ mv models/antelopev2_/antelopev2/ models/antelopev2/ rm -rf models/antelopev2_, antelopev2.zip </pre>

⚡️ Gradio Demo

The Gradio GUI demo is a good starting point to experiment with WithAnyone. Run it with:

python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
    --clip_path <path to clip-vit-large-patch14> \
    --t5_path <path to xflux_text_encoders> \
    --siglip_path <path to siglip-base-patch16-256-i18n> \
    --model_type "flux-dev" # or "flux-kontext" for WithAnyone.K

❗ WithAnyone requires face bounding boxes (bboxes). You should provide them to indicate where faces are. You can provide face bboxes in two ways:

  1. Upload an example image with desired face locations in Mask Configuration (Option 1: Automatic). The face bboxes will be extracted automatically, and faces will be generated in the same locations. Do not worry if the given image has a different resolution or aspect ratio; the face bboxes will be resized accordingly.
  2. Input face bboxes directly in Mask Configuration (Option 2: Manual). The format is x1,y1,x2,y2 for each face, one per line.
  3. <span style="color: #999;">(NOT recommended) leave both options empty, and the face bboxes will be randomly chosen from a pre-defined set. </span>

⭕ WithAnyone works well with LoRA. If you have any stylized LoRA checkpoints, use --additional_lora_ckpt <path to lora checkpoint> when launching the demo. The LoRA will be merged into the diffusion model.

python gradio_app.py --flux_path <path to flux1-dev directory> --ipa_path <path to withanyone directory> \
    --additional_lora_ckpt <path to lora checkpoint> \
    --lora_scale 0.8 # adjust the weight as needed 

⭕ In Advanced Options, there is a slider controlling whether outputs are more "similar in spirit" or "similar in form" to the reference faces.

  • Move the slider to the right to preserve more details in the reference image (expression, makeup, accessories, hairstyle, etc.). Identity will also be better preserved.
  • Move it to the left for more freedom and creativity. Stylization can be stronger, hair style and makeup can be changed.
<details> <summary>How the slider works and some tips</summary> The slider act
View on GitHub
GitHub Stars559
CategoryDevelopment
Updated8d ago
Forks21

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings