MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Zhekai Chen1, Yuqing Wang1, Manyuan Zhang†2, Xihui Liu†1 1HKU MMLab    2Meituan †Corresponding authors <a href="https://macro400k.github.io/"><img src="https://img.shields.io/badge/🌐%20Project%20Page-blue"></a>   <a href="https://huggingface.co/datasets/Azily/Macro-Dataset"><img src="https://img.shields.io/badge/🤗%20Macro--Dataset-yellow"></a>   <a href="https://arxiv.org/abs/2603.25319"><img src="https://img.shields.io/badge/arXiv-2603.25319-b31b1b.svg"></a>   <a href="https://huggingface.co/papers/2603.25319"><img src="https://img.shields.io/badge/🤗%20Daily%20Paper-orange"></a>

Macro is a multi-reference image generation dataset and benchmark. It covers four task categories — Customization, Illustration, Spatial, and Temporal — across four image-count brackets (1–3, 4–5, 6–7, ≥8 reference images). Alongside the dataset we provide fine-tuned checkpoints for three open-source models: Bagel, OmniGen2, and Qwen-Image-Edit.

Inference, Batch Evaluation & Scoring
- 1.1 Quick Single-Image Inference
- 1.2 Batch Inference on the Benchmark
- 1.3 LLM-based Scoring
Dataset
- 2.1 Download
- 2.2 Structure
- 2.3 Data Construction Pipeline (Reference Only)
Training
- 3.1 Environment Setup
- 3.2 Download Base Models
- 3.3 Prepare Training Data
- 3.4 Configure & Generate Training Scripts
- 3.5 Run Training

1. Inference, Batch Evaluation & Scoring

1.1 Quick Single-Image Inference

Three ready-to-use test scripts are under scripts/. Each script falls back to a built-in sample (assets/test_example/4-5_sample1.json) when no prompt or images are given.

Download Fine-tuned Inference Checkpoints

| Checkpoint | HuggingFace | |---|---| | Macro-Bagel | | | Macro-OmniGen2 | | | Macro-Qwen-Image-Edit | |

huggingface-cli download Azily/Macro-Bagel            --local-dir ckpts/Macro-Bagel
huggingface-cli download Azily/Macro-OmniGen2         --local-dir ckpts/Macro-OmniGen2
huggingface-cli download Azily/Macro-Qwen-Image-Edit  --local-dir ckpts/Macro-Qwen-Image-Edit

Common Arguments

All three test scripts share these arguments:

| Argument | Default | Description | |---|---|---| | --prompt | (from sample) | Text instruction | | --input_images | (from sample) | One or more reference image paths | | --output | outputs/test_<model>.jpg | Output image path | | --height / --width | 768 | Output resolution | | --seed | 42 | Random seed |

Bagel

# Default sample
python scripts/test_bagel.py

# Custom inputs
python scripts/test_bagel.py \
    --model_path ckpts/Macro-Bagel \
    --prompt "Generate an image of …" \
    --input_images img1.jpg img2.jpg

| Model Argument | Default | Description | |---|---|---| | --model_path | ckpts/Macro-Bagel | Fine-tuned Bagel checkpoint (also used as base model) | | --base_model_path | (same as --model_path) | Override base model path (or env BAGEL_BASE_MODEL_PATH) |

OmniGen2

# Default sample
python scripts/test_omnigen.py

# Custom inputs
python scripts/test_omnigen.py \
    --model_path       ckpts/Macro-OmniGen2 \
    --transformer_path ckpts/Macro-OmniGen2/transformer \
    --prompt "Generate an image of …" \
    --input_images img1.jpg img2.jpg

| Model Argument | Default | Description | |---|---|---| | --model_path | ckpts/Macro-OmniGen2 | OmniGen2 base model dir (or env OMNIGEN_MODEL_PATH) | | --transformer_path | ckpts/Macro-OmniGen2/transformer | Fine-tuned transformer; omit to use the base model | | --enable_model_cpu_offload | False | CPU offload to reduce GPU memory |

Qwen-Image-Edit

# Default sample
python scripts/test_qwen.py

# Custom inputs
python scripts/test_qwen.py \
    --model_base_path ckpts \
    --model_id        Macro-Qwen-Image-Edit \
    --prompt "Generate an image of …" \
    --input_images img1.jpg img2.jpg

| Model Argument | Default | Description | |---|---|---| | --model_base_path | ckpts | Root dir containing model sub-folders (or env DIFFSYNTH_MODEL_BASE_PATH) | | --model_id | Macro-Qwen-Image-Edit | Sub-directory name under model_base_path |

1.2 Batch Inference on the Benchmark

Each model has a dedicated inference/config.yaml. Edit it to set checkpoint paths and tasks, then run:

# Run all checkpoints in the config
python bagel/inference/run.py   --config bagel/inference/config.yaml
python omnigen/inference/run.py --config omnigen/inference/config.yaml
python qwen/inference/run.py    --config qwen/inference/config.yaml

# Run a single checkpoint / task / category
python bagel/inference/run.py --config bagel/inference/config.yaml \
    --ckpt bagel_macro --task customization --category 4-5

The config files share a common structure:

global_config:
  output_root: ./outputs/<model>   # where generated images are saved
  data_root:   ./data/filter        # evaluation data root

# YAML anchors for reusable task/category selections
common_tasks:
  all_tasks: &all_tasks
    customization: ["1-3", "4-5", "6-7", ">=8"]
    illustration:  ["1-3", "4-5", "6-7", ">=8"]
    spatial:       ["1-3", "4-5", "6-7", ">=8"]
    temporal:      ["1-3", "4-5", "6-7", ">=8"]

checkpoints:
  my_checkpoint:
    path: ./ckpts/...            # checkpoint path
    # transformer_path: ...      # optional: fine-tuned transformer (OmniGen2 / Qwen)
    # base_model_path:  ...      # optional: override base model path (Bagel only)
    tasks: *all_tasks

Results are written to outputs/<model>/<checkpoint_name>/<task>/<category>/.

Dynamic resolution — input images are automatically resized based on reference image count: [1M, 1M, 590K, 590K, 590K, 262K, 262K, 262K, 262K, 262K] pixels for 1, 2, 3 … 10+ images.

1.3 LLM-based Scoring

Configure API credentials

Option A — environment variables:

# GPT / OpenAI-compatible endpoint
export OPENAI_URL=https://api.openai.com/v1/chat/completions
export OPENAI_KEY=your_openai_api_key

# Gemini  (default model: gemini-3.0-flash-preview)
export GEMINI_API_KEY=your_gemini_api_key
export GEMINI_MODEL_NAME=gemini-3.0-flash-preview  # optional

Option B — eval/config.yaml (overrides environment variables):

global_config:
  api_config:
    openai:
      url: "https://api.openai.com/v1/chat/completions"
      key: "your_openai_api_key"
    gemini:
      api_key: "your_gemini_api_key"
      model_name: "gemini-3.0-flash-preview"

Configure evaluations

Edit eval/config.yaml:

global_config:
  output_root: ./outputs   # must match inference output_root
  use_gpt:    false        # set true to enable GPT scoring
  use_gemini: true         # set true to enable Gemini scoring
  parallel_workers: 16

evaluations:
  bagel:
    bagel_macro:
      tasks: *all_tasks
  omnigen:
    omnigen2_macro:
      tasks: *all_tasks
  qwen:
    qwen_macro:
      tasks: *all_tasks

Run scoring

# Run all configured evaluations
python eval/run_eval.py

# Run a specific model / experiment
python eval/run_eval.py --baseline bagel   --exp bagel_macro
python eval/run_eval.py --baseline omnigen --exp omnigen2_macro
python eval/run_eval.py --baseline qwen    --exp qwen_macro

Scores are saved as JSON files alongside the generated images in outputs/<model>/<exp>/<task>/<category>/.

2. Dataset

2.1 Download

The Macro dataset is available on Hugging Face as a collection of .tar.gz archives:

Step 1 — Download the archives

# Download all archives into data_tar/
huggingface-cli download Azily/Macro-Dataset --repo-type dataset --local-dir data_tar/

Selective download: If you only need the evaluation benchmark (JSON index files, no images), download just filter.tar.gz (~510 MB):
huggingface-cli download Azily/Macro-Dataset \
    --repo-type dataset \
    --include "filter.tar.gz" \
    --local-dir data_tar/
To download a specific task/split/category (e.g., customization train 1–3 images):
huggingface-cli download Azily/Macro-Dataset \
    --repo-type dataset \
    --include "final_customization_train_1-3.tar.gz" \
    --local-dir data_tar/

Step 2 — Extract

An extraction script extract_data.sh is included in the downloaded data_tar/ folder. Run it from the project root to restore the original data/ tree:

bash data_tar/extract_data.sh ./data_tar .
# Restores: ./data/filter/, ./data/final/, ./data/raw/

Or extract all archives manually:

for f in data_tar/*.tar.gz; do tar -xzf "$f" -C .; done

Macro

Install / Use

README