EditWorld

[ACM Multimedia 2025 Datasets Track] EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

Generate Convert Improve

Install / Use

/learn @YangLing0818/EditWorld

About this skill

Quality Score

0/100

README

[ACM Multimedia 2025] EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

News

August 1, 2025

Our EditWorld is accepted by ACM Multimedia 2025 Datasets Track.

June 23, 2024

After consulting with the sponsors, we have released a training dataset that has not been manually rechecked. The dataset link is EditWorld_data. Best of luck with your research!

Overview

This repository contains the official implementation of our EditWorld. In this work, we introduce a new task namely world-instructed image editing, which defines and categorizes the instructions grounded by various world scenarios. We curate a new image editing dataset with world instructions using a set of large pretrained models (e.g., GPT-3.5, Video-LLava and SDXL). We also propose a new post-edit method for world-instructed image editing.

World Instruction vs. Traditional Instruction

first_img

Generated Results of Our EditWorld:

sample1

Planning

[√] Providing full pipeline of text-to-image generation for EditWorld dataset.
[√] Releasing evaluation dataset.
[√] Releasing basic training dataset.
[ ] Releasing Checkpoints.
[ ] Releasing training and post-edit code.

Codebase

Text-to-image generation branch

Firstly, we employ GPT-3.5 to provide textual quadruples:

python gpt_script/text_img_gen_aigcbest_full.py --define_json gpt_script/define_sample_history/define_sample.json --output_path gpt_script/gen_sample_history/ --output_json text_gen.json

Then, we transform the text prompt provided by GPT into dict:

python tools/deal_text2json.py --input_json gpt_script/gen_sample_history/text_gen.json --output_json text_gen_full.json

Finally, we obtain the input-instruct-output triples based on the generated textual quadruples:

python t2i_branch_base.py --text_json text_gen_full.json --save_path datasets/editworld/generated_img/

It is worth noting that t2i_branch_base.py is the fast and basic version for text-to-image generation branch, we will improve this part in the future.

Video branch

Path video_script contains the code for downloading videos from the InternVid.

Dataset

Dataset structure

To obtain the training dataset file train.json, utilize the script located at tools/obtain_datasetjson.py. The dataset is organized in the following structure:

datasets/
├── editworld/
│   ├── generated_img/
│   │   ├── group_0/
│   │   │   ├── sample0_ori.png
│   │   │   ├── sample0_tar.png
│   │   │   ...
│   │   │   └── img_txt.json
│   │   └── group_1/
│   │   ...
│   ├── video_img/
│   │   ├── group_0/
│   │   │   ├── sample0_ori.png
│   │   │   ├── sample0_tar.png
│   │   │   ...
│   │   │   └── img_txt.json
│   │   └── group_1/
│   │   ...
│   └── human_select_img/
│       ├── group_0/
│       │   ├── sample0_ori.png
│       │   ├── sample0_tar.png
│       │   ...
│       │   └── img_txt.json
│       └── group_1/
│       ...
└── train.json

Evaluation dataset link

Our evaluation dataset is available at editworld_test.

Quantitative Comparison of CLIP Score and MLLM Score

IP2P: InstructPix2Pix; MB: MagicBrush. Bold results are the best.

CLIP Score of Text-to-image Branch

| Category | IP2P | MB | Editworld | w/o post-edit | |--------------------|----------|----------|-----------|-----------------| | Long-Term | 0.2140 | 0.1870 | 0.2244 | 0.2294 | | Physical-Trans | 0.2186 | 0.2101 | 0.2385 | 0.2467 | | Implicit-Logic | 0.2390 | 0.2432 | 0.2542| 0.2440 | | Story-Type | 0.2063 | 0.2070 | 0.2534| 0.2354 | | Real-to-Virtual | 0.2285 | 0.2344 | 0.2524| 0.2435 |

CLIP Score of Video Branch

| Category | IP2P | MB | Editworld | w/o post-edit | |--------------------|----------|----------|-----------|-----------------| | Spatial-Trans | 0.2175 | 0.1997 | 0.2420| 0.2286 | | Physical-Trans | 0.2315 | 0.2278 | 0.2467 | 0.2483 | | Story-Type | 0.2318 | 0.2262 | 0.2365 | 0.2399 | | Exaggeration | 0.2416 | 0.2328 | 0.2443| 0.2433 |

MLLM Score of Both Branches

| Category | IP2P | MB | Editworld | w/o post-edit | |--------------------|----------|----------|-----------|-----------------| | Text-to-image | 0.8763 | 0.8455 | 0.8958 | 0.9060 | | Video | 0.9493 | 0.9715 | 0.9920| 0.9891 |

Citation

@article{yang2024editworld,
  title={EditWorld: Simulating World Dynamics for Instruction-Following Image Editing},
  author={Yang, Ling and Zeng, Bohan and Liu, Jiaming and Li, Hong and Xu, Minghao and Zhang, Wentao and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2405.14785},
  year={2024}
}

Related Skills

qqbot-channel

345.9k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

345.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

YangLing0818

View profile

View on GitHub

GitHub Stars140

CategoryContent

Updated25d ago

Forks6

YangLing0818/EditWorld

Languages

Python

Security Score

85/100

Audited on Mar 8, 2026

No findings