Macro
The official repo of "MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data"
Install / Use
/learn @HKU-MMLab/MacroREADME
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
<p align="center"> <strong>Zhekai Chen<sup>1</sup>, Yuqing Wang<sup>1</sup>, Manyuan Zhang<sup>†2</sup>, Xihui Liu<sup>†1</sup></strong> <br> <sup>1</sup>HKU MMLab <sup>2</sup>Meituan <br> <sup>†</sup>Corresponding authors </p> <p align="center"> <a href="https://macro400k.github.io/"><img src="https://img.shields.io/badge/🌐%20Project%20Page-blue"></a> <a href="https://huggingface.co/datasets/Azily/Macro-Dataset"><img src="https://img.shields.io/badge/🤗%20Macro--Dataset-yellow"></a> <a href="https://arxiv.org/abs/2603.25319"><img src="https://img.shields.io/badge/arXiv-2603.25319-b31b1b.svg"></a> <a href="https://huggingface.co/papers/2603.25319"><img src="https://img.shields.io/badge/🤗%20Daily%20Paper-orange"></a> </p>Macro is a multi-reference image generation dataset and benchmark. It covers four task categories — Customization, Illustration, Spatial, and Temporal — across four image-count brackets (1–3, 4–5, 6–7, ≥8 reference images). Alongside the dataset we provide fine-tuned checkpoints for three open-source models: Bagel, OmniGen2, and Qwen-Image-Edit.
Table of Contents
<p align="center"> <img src="assets/teaser.jpg" width="100%"> </p>
1. Inference, Batch Evaluation & Scoring
1.1 Quick Single-Image Inference
Three ready-to-use test scripts are under scripts/. Each script falls back to a built-in sample (assets/test_example/4-5_sample1.json) when no prompt or images are given.
Download Fine-tuned Inference Checkpoints
| Checkpoint | HuggingFace |
|---|---|
| Macro-Bagel | |
| Macro-OmniGen2 |
|
| Macro-Qwen-Image-Edit |
|
huggingface-cli download Azily/Macro-Bagel --local-dir ckpts/Macro-Bagel
huggingface-cli download Azily/Macro-OmniGen2 --local-dir ckpts/Macro-OmniGen2
huggingface-cli download Azily/Macro-Qwen-Image-Edit --local-dir ckpts/Macro-Qwen-Image-Edit
Common Arguments
All three test scripts share these arguments:
| Argument | Default | Description |
|---|---|---|
| --prompt | (from sample) | Text instruction |
| --input_images | (from sample) | One or more reference image paths |
| --output | outputs/test_<model>.jpg | Output image path |
| --height / --width | 768 | Output resolution |
| --seed | 42 | Random seed |
Bagel
# Default sample
python scripts/test_bagel.py
# Custom inputs
python scripts/test_bagel.py \
--model_path ckpts/Macro-Bagel \
--prompt "Generate an image of …" \
--input_images img1.jpg img2.jpg
| Model Argument | Default | Description |
|---|---|---|
| --model_path | ckpts/Macro-Bagel | Fine-tuned Bagel checkpoint (also used as base model) |
| --base_model_path | (same as --model_path) | Override base model path (or env BAGEL_BASE_MODEL_PATH) |
OmniGen2
# Default sample
python scripts/test_omnigen.py
# Custom inputs
python scripts/test_omnigen.py \
--model_path ckpts/Macro-OmniGen2 \
--transformer_path ckpts/Macro-OmniGen2/transformer \
--prompt "Generate an image of …" \
--input_images img1.jpg img2.jpg
| Model Argument | Default | Description |
|---|---|---|
| --model_path | ckpts/Macro-OmniGen2 | OmniGen2 base model dir (or env OMNIGEN_MODEL_PATH) |
| --transformer_path | ckpts/Macro-OmniGen2/transformer | Fine-tuned transformer; omit to use the base model |
| --enable_model_cpu_offload | False | CPU offload to reduce GPU memory |
Qwen-Image-Edit
# Default sample
python scripts/test_qwen.py
# Custom inputs
python scripts/test_qwen.py \
--model_base_path ckpts \
--model_id Macro-Qwen-Image-Edit \
--prompt "Generate an image of …" \
--input_images img1.jpg img2.jpg
| Model Argument | Default | Description |
|---|---|---|
| --model_base_path | ckpts | Root dir containing model sub-folders (or env DIFFSYNTH_MODEL_BASE_PATH) |
| --model_id | Macro-Qwen-Image-Edit | Sub-directory name under model_base_path |
1.2 Batch Inference on the Benchmark
Each model has a dedicated inference/config.yaml. Edit it to set checkpoint paths and tasks, then run:
# Run all checkpoints in the config
python bagel/inference/run.py --config bagel/inference/config.yaml
python omnigen/inference/run.py --config omnigen/inference/config.yaml
python qwen/inference/run.py --config qwen/inference/config.yaml
# Run a single checkpoint / task / category
python bagel/inference/run.py --config bagel/inference/config.yaml \
--ckpt bagel_macro --task customization --category 4-5
The config files share a common structure:
global_config:
output_root: ./outputs/<model> # where generated images are saved
data_root: ./data/filter # evaluation data root
# YAML anchors for reusable task/category selections
common_tasks:
all_tasks: &all_tasks
customization: ["1-3", "4-5", "6-7", ">=8"]
illustration: ["1-3", "4-5", "6-7", ">=8"]
spatial: ["1-3", "4-5", "6-7", ">=8"]
temporal: ["1-3", "4-5", "6-7", ">=8"]
checkpoints:
my_checkpoint:
path: ./ckpts/... # checkpoint path
# transformer_path: ... # optional: fine-tuned transformer (OmniGen2 / Qwen)
# base_model_path: ... # optional: override base model path (Bagel only)
tasks: *all_tasks
Results are written to outputs/<model>/<checkpoint_name>/<task>/<category>/.
Dynamic resolution — input images are automatically resized based on reference image count:
[1M, 1M, 590K, 590K, 590K, 262K, 262K, 262K, 262K, 262K]pixels for 1, 2, 3 … 10+ images.
1.3 LLM-based Scoring
Configure API credentials
Option A — environment variables:
# GPT / OpenAI-compatible endpoint
export OPENAI_URL=https://api.openai.com/v1/chat/completions
export OPENAI_KEY=your_openai_api_key
# Gemini (default model: gemini-3.0-flash-preview)
export GEMINI_API_KEY=your_gemini_api_key
export GEMINI_MODEL_NAME=gemini-3.0-flash-preview # optional
Option B — eval/config.yaml (overrides environment variables):
global_config:
api_config:
openai:
url: "https://api.openai.com/v1/chat/completions"
key: "your_openai_api_key"
gemini:
api_key: "your_gemini_api_key"
model_name: "gemini-3.0-flash-preview"
Configure evaluations
Edit eval/config.yaml:
global_config:
output_root: ./outputs # must match inference output_root
use_gpt: false # set true to enable GPT scoring
use_gemini: true # set true to enable Gemini scoring
parallel_workers: 16
evaluations:
bagel:
bagel_macro:
tasks: *all_tasks
omnigen:
omnigen2_macro:
tasks: *all_tasks
qwen:
qwen_macro:
tasks: *all_tasks
Run scoring
# Run all configured evaluations
python eval/run_eval.py
# Run a specific model / experiment
python eval/run_eval.py --baseline bagel --exp bagel_macro
python eval/run_eval.py --baseline omnigen --exp omnigen2_macro
python eval/run_eval.py --baseline qwen --exp qwen_macro
Scores are saved as JSON files alongside the generated images in outputs/<model>/<exp>/<task>/<category>/.
2. Dataset
2.1 Download
The Macro dataset is available on Hugging Face as a collection of .tar.gz archives:
Step 1 — Download the archives
# Download all archives into data_tar/
huggingface-cli download Azily/Macro-Dataset --repo-type dataset --local-dir data_tar/
Selective download: If you only need the evaluation benchmark (JSON index files, no images), download just
filter.tar.gz(~510 MB):huggingface-cli download Azily/Macro-Dataset \ --repo-type dataset \ --include "filter.tar.gz" \ --local-dir data_tar/To download a specific task/split/category (e.g., customization train 1–3 images):
huggingface-cli download Azily/Macro-Dataset \ --repo-type dataset \ --include "final_customization_train_1-3.tar.gz" \ --local-dir data_tar/
Step 2 — Extract
An extraction script extract_data.sh is included in the downloaded data_tar/ folder. Run it from the project root to restore the original data/ tree:
bash data_tar/extract_data.sh ./data_tar .
# Restores: ./data/filter/, ./data/final/, ./data/raw/
Or extract all archives manually:
for f in data_tar/*.tar.gz; do tar -xzf "$f" -C .; done
