HeadRouter
ACM TOG 2026🎉 Offical repository for "HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads".
Install / Use
/learn @ICTMCG/HeadRouterREADME
HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads

<a href='https://yuci-gpt.github.io/headrouter/'><img src='https://img.shields.io/badge/Project-Page-green'></a> <a href='https://dl.acm.org/doi/10.1145/3797956'><img src='https://img.shields.io/badge/ACM%20TOG-2025-blue'></a> <a href='https://arxiv.org/abs/2411.15034'><img src='https://img.shields.io/badge/ArXiv-2403.16510-red'></a>
</div>TL; DR
HeadRouter is a training-free text guided real image editing framework that based on MM-DiT (e.g. SD3 and Flux).
Abstract
Diffusion Transformers (DiTs) have exhibited robust capabilities in image generation tasks. However, accurate text-guided image editing for multimodal DiTs (MM-DiTs) still poses a significant challenge. Unlike UNet-based structures that could utilize self/cross-attention maps for semantic editing, MM-DiTs inherently lack support for explicit and consistent incorporated text guidance, resulting in semantic misalignment between the edited results and texts. In this study, we disclose the sensitivity of different attention heads to different image semantics within MM-DiTs and introduce HeadRouter, a training-free image editing framework that edits the source image by adaptively routing the text guidance to different attention heads in MM-DiTs. Furthermore, we present a dual-token refinement module to refine text/image token representations for precise semantic guidance and accurate region expression. Experimental results on multiple benchmarks demonstrate HeadRouter's performance in terms of editing fidelity and image quality.
Installation & Usage
- Clone the repository and install the environment:
git clone https://github.com/your-repo/HeadRouter.git cd HeadRouter/diffusers pip install -e . - Run Inference:
You can run the inference script using:
python infer.py
Important Note on Hyper-parameters: Please note that training-free image editing relies heavily on hyper-parameter tuning. You will need to adjust the hyper-parameters based on the specific input image and the type of editing you want to perform.
Tips: The larger the --eta value, the closer the edited result will be to the original image.
Below is our recommended hyper-parameter configuration for various inversion and editing tasks:
Hyper-parameter configuration of our method for inversion and editing tasks
| Task | Starting Time | Stopping Time | Strength | | :--- | :---: | :---: | :---: | | Object insert | 0 | 6 | 1.0 | | Gender editing | 0 | 8 | 1.0 | | Age editing | 0 | 5 | 1.0 | | Adding glasses | 6 | 25 | 0.7 | | Stylization | 0 | 6 | 0.9 |
(Note: Stopping Time and Strength are parameters for Controller Guidance)
Citation
If you find this work useful, please consider citing:
@article{xu2024headrouter,
title={Headrouter: A training-free image editing framework for mm-dits by adaptively routing attention heads},
author={Xu, Yu and Tang, Fan and Cao, Juan and Kong, Xiaoyu and Zhang, Yuxin and Li, Jintao and Deussen, Oliver and Lee, Tong-Yee},
journal={ACM Transactions on Graphics},
publisher={ACM New York, NY}
}
Acknowledgements
This work is built upon several excellent open-source projects and research efforts. We sincerely thank the authors and contributors for making their work publicly available and for advancing the community:
-
Diffusers
https://github.com/huggingface/diffusers -
RF-Inversion
https://github.com/LituRout/RF-Inversion
Pipeline

Comparison with baselines

More of our results

Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.8kCommit, push, and open a PR
