Knowledge2Data
[TASLP 2025] Spatial Knowledge Graph-Guided Synthesis for Multimodal LLMs
Install / Use
/learn @zjunlp/Knowledge2DataREADME
Table of Contents
- <a href="#news">What's New</a> •
- <a href="#overview">Overview</a> •
- <a href="#quickstart">Quickstart</a> •
- <a href="#citation">Citation</a>
🔔News
- 2025-11-01, Our paper has been ACCEPTED for publication as a REGULAR paper in the IEEE TASLP(Transactions on Audio, Speech and Language Processing).
- 2025-02-28, We release the paper.
🌟Overview
<div align="center"> <img src="figs/figure2.png" width="90%"> </div>⏩Quickstart
Data
Get training data and test data from HuggingFace: https://huggingface.co/datasets/zjunlp/Knowledge2Data
Installation
git clone https://github.com/zjunlp/Knowledge2Data
cd Knowledge2Data
conda create -n skg python==3.9
conda activate skg
pip install -r requirements.txt
Download the models
Download the following models from HuggingFace
| 🎯 Model Name | 🤗 HuggingFace | |-------------------------------|---------------------------------------------------------------------------| | Diffusers-generation-text-box | gligen/diffusers-generation-text-box | | Sam-vit-base | stabilityai/stable-diffusion-xl-refiner-1.0 | | Stable-diffusion-xl-refiner | facebook/sam-vit-base |
Export the environment variables.
cd src
export OPENAI_API_KEY="YOUR_API_KEY"
export SKG_HF_MODELS="LOCAL_HUGGINGFACE_MODELS_DIR"
Generate Spatial KG and multimodal synthetic data.
Execute script to generate Spatial KG.
sh run_skg.sh
You can also customize objects and their spatial relationships to form Spatial KG. Save the file format as a JSON file similar to "src/data/skg_demo.json".
Execute script to multimodal synthetic data.
sh run_data.sh
For custom data, only the input file parameters "--input_file" need to be modified.
You can find generated data in "src/data" and images in "src/img_generations" as default. If you want to generate more data, you can modify the parameters including "--num_scenes" (generate_scenes.py) and "--repeats" (generate_images.py).
🌻Acknowledgement
This project is based on open-source projects including LLM-groundedDiffusion. Thanks for their great contributions!
🚩Citation
Please cite the following paper if you use this project in your work.
@misc{xue2025spatialknowledgegraphguidedmultimodal,
title={Spatial Knowledge Graph-Guided Multimodal Synthesis},
author={Yida Xue and Zhen Bi and Jinnan Yang and Jungang Lou and Huajun Chen and Ningyu Zhang},
year={2025},
eprint={2505.22633},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.22633},
}
Related Skills
node-connect
354.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
