InfiniteYou
🔥 [ICCV 2025 Highlight] InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Install / Use
/learn @bytedance/InfiniteYouREADME
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liming Jiang Qing Yan Yumin Jia Zichuan Liu Hao Kang Xin Lu<br /> ByteDance Intelligent Creation<br /> ICCV 2025 (<span style="color:#F44336">Highlight</span>)
<a href="https://bytedance.github.io/InfiniteYou"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=blue&logo=github-pages"></a> <a href="https://arxiv.org/abs/2503.16418"><img src="https://img.shields.io/static/v1?label=ArXiv&message=Paper&color=darkred&logo=arxiv"></a> <a href="https://huggingface.co/ByteDance/InfiniteYou"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%96%20Released&message=Models&color=green"></a> <a href="https://github.com/bytedance/ComfyUI_InfiniteYou"><img src="https://img.shields.io/static/v1?label=%E2%9A%99%EF%B8%8F%20ComfyUI&message=Node&color=purple"></a> <a href="https://huggingface.co/spaces/ByteDance/InfiniteYou-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Demo&color=orange"></a>
</div>
Abstract: Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
🔥 News
-
[07/2025] 🔥 The paper of InfiniteYou is selected as ICCV 2025 (<span style="color:#F44336">Highlight</span>).
-
[06/2025] 🔥 The paper of InfiniteYou is accepted to ICCV 2025.
-
[04/2025] 🔥 The official ComfyUI node is released. Unofficial ComfyUI contributions are appreciated.
-
[04/2025] 🔥 Quantization and offloading options are provided to reduce the memory requirements for InfiniteYou-FLUX v1.0.
-
[03/2025] 🔥 The code, model, and demo of InfiniteYou-FLUX v1.0 are released.
-
[03/2025] 🔥 The project page of InfiniteYou is created.
-
[03/2025] 🔥 The paper of InfiniteYou is released on arXiv.
💡 Important Usage Tips
-
We released two model variants of InfiniteYou-FLUX v1.0: aes_stage2 and sim_stage1. The
aes_stage2is our model after SFT, which is used by default for better text-image alignment and aesthetics. For higher ID similarity, please trysim_stage1(using--model_versionto switch). More details can be found in our paper. -
To better fit specific personal needs, we find that two arguments are highly useful to adjust: <br />
--infusenet_conditioning_scale(default:1.0) and--infusenet_guidance_start(default:0.0). Usually, you may NOT need to adjust them. If necessary, start by trying a slightly larger--infusenet_guidance_start(e.g.,0.1) only (especially helpful forsim_stage1). If still not satisfactory, then try a slightly smaller--infusenet_conditioning_scale(e.g.,0.9). -
We also provided two LoRAs (Realism and Anti-blur) to enable additional usage flexibility. If needed, try
Realismonly first. They are entirely optional, which are examples to try but are NOT used in our paper. -
If the generated gender does not align with your preferences, try adding specific words in the text prompt, such as 'a man', 'a woman', etc. We encourage users to use inclusive and respectful language.
:european_castle: Model Zoo
| InfiniteYou Version | Model Version | Base Model Trained with | Description |
| :---: | :---: | :---: | :---: |
| InfiniteYou-FLUX v1.0 | aes_stage2 | FLUX.1-dev | Stage-2 model after SFT. Better text-image alignment and aesthetics. |
| InfiniteYou-FLUX v1.0 | sim_stage1 | FLUX.1-dev | Stage-1 model before SFT. Higher identity similarity. |
🔧 Requirements and Installation
Dependencies
Simply run this one-line command to install (feel free to create a python3 virtual environment before you run):
pip install -r requirements.txt
Memory Requirements
-
Full-performance: The original
bf16model inference requires a peak VRAM of around 43GB. -
Fast CPU offloading: By specifying only
--cpu_offloadin test.py, the peak VRAM is reduced to around 30GB with NO performance degradation. -
8-bit quantization: By specifying only
--quantize_8bitin test.py, the peak VRAM is reduced to around 24GB with performance remaining very similar. -
Combining fast CPU offloading and 8-bit quantization: By specifying both
--cpu_offloadand <br />--quantize_8bit, the peak VRAM is further reduced to around 16GB with performance remaining very similar.
If you want to use our models but only have a GPU with even less VRAM, please further refer to Diffusers memory reduction tips, where some more aggressive strategies may be helpful. Community contributions are also welcome.
⚡️ Quick Inference
Local Inference Script
python test.py --id_image ./assets/examples/man.jpg --prompt "A man, portrait, cinematic" --out_results_dir ./results
<details>
<summary style='font-size:20px'><b><i>Explanation of all the arguments (click to expand!)</i></b></summary>
- Input and output:
--id_image (str): The path to the input identity (ID) image. Default:./assets/examples/man.jpg.--prompt (str): The text prompt for image generation. Default:A man, portrait, cinematic.--out_results_dir (str): The path to the output directory to save the generated results. Default:./results.--control_image (str or None): The path to the control image [optional] to extract five facical keypoints to control the generation. Default:None.--base_model_path (str): The huggingface or local path to the base model. Default:black-forest-labs/FLUX.1-dev.--model_dir (str): The path to the InfiniteYou model directory. Default:ByteDance/InfiniteYou.
- Version control:
--infu_flux_version (str): InfiniteYou-FLUX version: currently onlyv1.0is supported. Default:v1.0.--model_version (str): The model variant to use:aes_stage2|sim_stage1. Default:aes_stage2.
- General inference arguments:
--cuda_device (int): The cuda device ID to use. Default:0.--seed (int): The seed for reproducibility (0 for random). Default:0.--guideance_scale (float): The guidance scale for the diffusion process. Default:3.5.--num_steps (int): The number of inference steps. Default:30.
- InfiniteYou-specific arguments:
--infusenet_conditioning_scale (float): The scale for the InfuseNet conditioning. Default:1.0.--infusenet_guidance_start (float): The start point for the InfuseNet guidance injection. Default:0.0.--infusenet_guidance_end (float): The end point for the InfuseNet guidance injection. Default:1.0.
- Optional LoRAs:
--enable_realism_lora (store_true): Whether to enable the Realism LoRA. Default:False.--enable_anti_blur_lora (store_true): Whether to enable the Anti-blur LoRA. Default:False.
- Memory reduction options:
--quantize_8bit (store_true): Whether to quantize the model to the 8-bit format. Default:False.--cpu_offload (store_true): Whether to use fast CPU offloading. Default:False.
Local Gradio Demo
python app.py
Related Skills
docs-writer
99.1k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
335.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
