WildWorld
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
Install / Use
/learn @ShandaAI/WildWorldREADME
This repo contains the dataset and benchmark code used in
Zhen Li, Zian Meng, Shuwei Shi, Wenshuo Peng, Yuwei Wu, Bo Zheng, Chuanhao Li, Kaipeng Zhang
Alaya Studio, Shanda AI Research Tokyo; Beijing Institute of Technology; Shanghai Innovation Institute
🔥Update
- [2026.03.25] We have released our paper — discussions and feedback are warmly welcome!
🧠Introduction

TL;DR We present WildWorld, a large-scale action-conditioned world modeling dataset with explicit state annotations, automatically collected from a photorealistic AAA action role-playing game. It features:
- 🎬 108M+ frames with per-frame annotations: character skeletons, actions & states (HP, animation, etc.), camera poses, and depth maps
- ⚔️ 450+ semantically meaningful actions including movement, attacks, and skill casting
- 🐉 Diverse content: 29 monster species, 4 player characters, 4 weapon types, 5 distinct stages
- 🕒 Long-horizon sequences: clips spanning up to 30+ minutes of continuous gameplay
- 📝 Hierarchical captions: both action-level and sample-level natural language descriptions
📦TODO
- [ ] Release WildWorld dataset and WildBench benchmark.
📄License
See LICENSE.
📖Citation
If you find this project helpful, please consider citing:
@misc{li2026wildworldlargescaledatasetdynamic,
title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG},
author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
year={2026},
eprint={2603.23497},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.23497},
}
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Mar 31, 2026
