X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning <br> Jian Ma<sup>1</sup>, Xujie Zhu<sup>2</sup>, Zihao Pan<sup>2</sup>, Qirong Peng<sup>1</sup>, Xu Guo<sup>3</sup>, Chen Chen<sup>1</sup>, Haonan Lu<sup>1</sup>
<br>

<sup>1</sup>OPPO AI Center, <sup>2</sup>Sun Yat-sen University, <sup>3</sup>Tsinghua University <br>

X2Edit image generation results

News

2025/09/16: We release a dataset built using Qwen-Image and Qwen-Image-Edit. This sub-dataset specifically focuses on subject-driven generation with facial consistency—a key requirement for tasks requiring stable subject identity across generated content. Asian-portrait and NonAsian-portrait
2025/08/25: Support Qwen-Image for training and inference. Checkpoint

X2Edit image generation results with Qwen-Image

Environment

Prepare the environment, install the required libraries:

$ cd X2Edit
$ conda create --name X2Edit python==3.11
$ conda activate X2Edit
$ pip install -r requirements.txt

Clone LaMa to data_pipeline and rename it to lama. Clone SAM and GroundingDINO to SAM, and then rename them to segment_anything and GroundingDINO

Data Construction

(./assets/dataset_detail.jpg)

X2Edit provides executable scripts for each data construction workflow shown in the figure. We organize the dataset using the WebDataset format. Please replace the dataset in the scripts. The following Qwen model can be selected from Qwen2.5-VL-72B, Qwen3-8B, and Qwen2.5-VL-7B. In addition, we also use aesthetic scoring models for screening, please donwload SigLIP and aesthetic-predictor-v2-5, and then change the path in siglip_v2_5.py.

Subject Addition & Deletion → use expert_subject_deletion.py and expert_subject_deletion_filter.py: The former script is used to construct deletion-type data, while the latter uses the fine-tuned Qwen2.5-VL-7B to further screen the constructed deletion-type data. Before executing, download RAM, GroundingDINO, SAM, Randeng-Deltalm, InfoXLM, RMBG and LaMa.
Normal Editing Tasks → use step1x_data.py: Please download the checkpoint Step1X-Edit. The language model we use is Qwen2.5-VL-72B.
Subject-Driven Generation → use kontext_subject_data.py: Please download the checkpoints FLUX.1-Kontext, DINOv2, CLIP, OPUS-MT-zh-en, shuttle-3-diffusion. The language model we use is Qwen3-8B.
Style Transfer → use kontext_style_transfer.py: Please download the checkpoints FLUX.1-Kontext, DINOv2, CLIP, OPUS-MT-zh-en, shuttle-3-diffusion. The language model we use is Qwen3-8B.
Style Change → use expert_style_change.py: Please download the checkpoints FLUX.1-dev, OmniConsistency. We use Qwen2.5-VL-7B to score.
Text Change → use expert_text_change_ch.py for Chinese and use expert_text_change_en.py for English: Please download the checkpoint textflux. We use Qwen2.5-VL-7B to score.
Complex Editing Tasks → use bagel_data.py: Please download the checkpoint Bagel. We use Qwen2.5-VL-7B to score.
High Fidelity Editing Tasks → use gpt4o_data.py: Please download the checkpoint OPUS-MT-zh-en and use your own GPT-4o API. We use Qwen2.5-VL-7B to score.
High Resoluton Data Construction → use kontext_data.py: Please download the checkpoint FLUX.1-dev and OPUS-MT-zh-en. We use Qwen2.5-VL-7B to score.

Inference

We provides inference scripts for editing images with resolutions of 1024 and 512. In addition, we can choose the base model of X2Edit, including FLUX.1-Krea, FLUX.1-dev, FLUX.1-schnell, PixelWave, shuttle-3-diffusion, and choose the LoRA for integration with MoE-LoRA including Turbo-Alpha, AntiBlur, Midjourney-Mix2, Super-Realism, Chatgpt-Ghibli. Choose the model you like and download it. For the MoE-LoRA, we will open source a unified checkpoint that can be used for both 512 and 1024 resolutions.

Before executing the script, download Qwen3-8B to select the task type for the input instruction, base model(FLUX.1-Krea, FLUX.1-dev, FLUX.1-schnell, shuttle-3-diffusion), MLLM and Alignet. All scripts follow analogous command patterns. Simply replace the script filename while maintaining consistent parameter configurations.

$ python infer.py --device cuda --pixel 1024 --num_experts 12 --base_path BASE_PATH --qwen_path QWEN_PATH --lora_path LORA_PATH --extra_lora_path EXTRA_LORA_PATH
$ python infer_qwen.py --device cuda --pixel 1024 --num_experts 12 --base_path BASE_PATH --qwen_path QWEN_PATH --lora_path LORA_PATH --extra_lora_path EXTRA_LORA_PATH  ## for Qwen-Image backbone

device: The device used for inference. default: cuda<br> pixel: The resol

X2Edit

Install / Use

README