SkillAgentSearch skills...

LayoutVLM

Official code for "LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models" (CVPR 2025)

Install / Use

/learn @sunfanyunn/LayoutVLM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

LayoutVLM

<div align="left"> <a href="https://ai.stanford.edu/~sunfanyun/layoutvlm"><img src="https://img.shields.io/badge/🌐 Website-Visit-orange"></a> <a href=""><img src="https://img.shields.io/badge/arXiv-PDF-blue"></a> </div> <br>

Installation

  1. Clone this repository
  2. Install dependencies (python 3.10):
pip install -r requirements.txt
  1. Install Rotated IOU Loss (https://github.com/lilanxiao/Rotated_IoU)
cd third_party/Rotated_IoU/cuda_op
python setup.py install

Data preprocessing

  1. Download the dataset https://drive.google.com/file/d/1WGbj8gWn-f-BRwqPKfoY06budBzgM0pu/view?usp=sharing
  2. Unzip it.

Refer to https://github.com/allenai/Holodeck and https://github.com/allenai/objathor for how we preprocess Objaverse assets.

Usage

  1. Prepare a scene configuration JSON file of Objaverse assets with the following structure:
{
    "task_description": ...,
    "layout_criteria": ...,
    "boundary": {
        "floor_vertices": [[x1, y1, z1], [x2, y2, z2], ...],
        "wall_height": height
    },
    "assets": {
        "asset_id": {
            "path": "path/to/asset.glb",
            "assetMetadata": {
                "boundingBox": {
                    "x": width,
                    "y": depth,
                    "z": height
                }
            }
        }
    }
}
  1. Run LayoutVLM:
python main.py --scene_json_file path/to/scene.json --openai_api_key your_api_key

Output

The script will generate a layout.json file in the specified save directory containing the optimized positions and orientations of all assets in the scene.

BibTeX

@inproceedings{sun2025layoutvlm,
  title={Layoutvlm: Differentiable optimization of 3d layout via vision-language models},
  author={Sun, Fan-Yun and Liu, Weiyu and Gu, Siyi and Lim, Dylan and Bhat, Goutam and Tombari, Federico and Li, Manling and Haber, Nick and Wu, Jiajun},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={29469--29478},
  year={2025}
}
View on GitHub
GitHub Stars165
CategoryDevelopment
Updated3d ago
Forks12

Languages

Python

Security Score

80/100

Audited on Apr 4, 2026

No findings