SkillAgentSearch skills...

SpatialGen

[3DV 2026] SpatialGen: Layout-guided 3D Indoor Scene Generation

Install / Use

/learn @manycore-research/SpatialGen
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SpatialGen: Layout-guided 3D Indoor Scene Generation

<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <!-- markdownlint-disable no-duplicate-header --> <div align="center"> <img src="assets/logo_light.png#gh-light-mode-only" width="60%" alt="SpatialLM" /> <img src="assets/logo_dark.png#gh-dark-mode-only" width="60%" alt="SpatialLM" /> </div> <hr style="margin-top: 0; margin-bottom: 8px;"> <div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;"> <a href="https://manycore-research.github.io/SpatialGen" target="_blank" style="margin: 2px;"><img alt="Project" src="https://img.shields.io/badge/🌐%20Project-SpatialGen-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a> <a href="https://arxiv.org/abs/2509.14981" target="_blank" style="margin: 2px;"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-SpatialGen-b31b1b?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a> <a href="https://github.com/manycore-research/SpatialGen" target="_blank" style="margin: 2px;"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-SpatialGen-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a> <a href="https://huggingface.co/manycore-research/SpatialGen-1.0" target="_blank" style="margin: 2px;"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialGen-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a> </div> <div align="center">

Chuan Fang, Heng Li, Yixun Liang, Jia Zheng, Yongsen Mao, Yuan Liu, Rui Tang, Zihan Zhou, Ping Tan

HKUST Spatial Artificial Intelligence Lab; Manycore Tech Inc

</div> <div align="center">

| Image-to-Scene Results | Text-to-Scene Results | | :--------------------------------------: | :----------------------------------------: | | Img2Scene | Text2Scene |

<p>TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.</p> </div>

✨ News

  • [Jan, 2026] The offical SpatialGen Dataset will be released soon, it is undergoing the data releasing approval process of Manycore Tech Inc.
  • [Jan, 2026] We release the training code of SpatialGen, incuding the multi-view diffusion training and the gassuain optimization.
  • [Nov, 2025] SpatialGen is accepted to 3DV 2026!
  • [Sep, 2025] We release the paper of SpatialGen!
  • [Aug, 2025] Initial release of SpatialGen-1.0!

📋 Release Plan

  • [x] Provide inference code of SpatialGen.
  • [x] Provide training instruction for SpatialGen.
  • [ ] Release SpatialGen dataset.

SpatialGen Models

<div align="center">

| Model | Download | | :-----------------------: | -------------------------------------------------------------------------------------| | SpatialGen-1.0 | 🤗 HuggingFace | | FLUX.1-Wireframe-dev-lora | 🤗 HuggingFace |

</div>

Usage

🔧 Installation

Tested with the following environment:

  • Python 3.10
  • PyTorch 2.3.1
  • CUDA Version 12.1
# clone the repository
git clone --recursive  https://github.com/manycore-research/SpatialGen.git 
cd SpatialGen

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
pip install src/recons/Sparse-RaDeGS/submodules/diff-gaussian-rasterization
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8

📊 Dataset

We provide SpatialGen-Testset with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.

Inference

# download the pretrain weights
huggingface-cli download --resume-download manycore-research/SpatialGen-1.0 --local-dir spatialgen_ckpts

# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh

# Text-to-image-to-3D Scene
# step 1. prepare controlnet conditional images, save to /path/to/your/spatialgen-testset/scene_xxxx/condition
python3 preprocess/prepare_flux_ctrlnet_conditions.py --dataset_dir /path/to/your/spatialgen-testset

# step 2. run spatialgen text2scene
# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room, 
# choose a pair of scene_id and prompt to run the text2scene experiment
bash scripts/infer_spatialgen_t2s.sh

Training

# TODO: preprocess the dataset

# run SCM_VAE training
bash scripts/train_scm_vae.sh

# run multi-view diffusion training
bash scripts/train_spatialgen_mvd.sh

License

SpatialGen-1.0 is derived from Stable-Diffusion-v2.1, which is licensed under the CreativeML Open RAIL++-M License. FLUX.1-Wireframe-dev-lora is licensed under the FLUX.1-dev Non-Commercial License.

Citation

@inproceedings{SpatialGen,
  title     = {SpatialGen: Layout-guided 3D Indoor Scene Generation},
  author    = {Fang, Chuan and Li, Heng and Liang, Yixu and Zheng, Jia and Mao, Yongsen and Liu, Yuan and Tang, Rui and Zhou, Zihan and Tan, Ping},
  booktitle = {International Conference on 3D Vision},
  year      = {2026}
}

Acknowledgements

We would like to thank the following projects that made this work possible:

DiffSplat | SD 2.1 | TAESD | FLUX | SpatialLM

Related Skills

View on GitHub
GitHub Stars369
CategoryContent
Updated1d ago
Forks18

Languages

Python

Security Score

100/100

Audited on Apr 8, 2026

No findings