ARTDECO
[ICLR 2026]ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Install / Use
/learn @InternRobotics/ARTDECOREADME
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Guanghao Li*, Kerui Ren*, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu<sup>†</sup>, Mulin Yu<sup>†</sup>, Jiangmiao Pang <br/>
📢 News
[2026.02.19] 🚀 Training code is officially released!
[2026.01.26] 🎉 Our paper has been accepted by ICLR 2026.
🏗️ System Architecture
Frontend and Backend Modules
(a) Frontend: Images are captured from the scene and streamed into the front-end part. Each incoming frame is aligned with the latest keyframe using a matching module to compute pixel correspondences. Based on the correspondence ratio and pixel displacement, the frame is classified as a keyframe, a mapper frame, or a common frame. The selected frame, along with its pose and point cloud, is then passed to the back-end. (b) Backend: For each new keyframe, a loop-detection module evaluates its similarity with previous keyframes. If a loop is detected, the most relevant candidates are refined and connected in the factor graph; otherwise, the keyframe is linked only to recent frames. Finally, global pose optimization is performed with Gauss–Newton, and other frames are adjusted accordingly. We instantiate the matching module with MASt3R and the loop-detection module with Pi3.
Mapping Module
When a keyframe or mapper frame arrives from the backend, new Gaussians are added to the scene. Multi-resolution inputs are analyzed with the Laplacian of Gaussian (LoG) operator to identify regions that require refinement, and new Gaussians are initialized at the corresponding monocular depth positions in the current view. Common frames are not used to add Gaussians but contribute through gradient-based refinement. Each primitive stores position, spherical harmonics (SH), base scale, opacity, local feature, dmax, and voxel index vid. For rendering, the dmax attribute determines whether a Gaussian is included at a given viewing distance, enabling consistent level-of-detail control.
🛠️ Installation
Environment Setup
Our framework is validated on Python 3.11/3.12, PyTorch 2.3.1/2.7.1, and CUDA 12.1/12.8, generally compatible with recent PyTorch/CUDA releases.
- Clone the repo.
git clone https://github.com/InternRobotics/ARTDECO.git
cd ARTDECO/
- Create the environment and install PyTorch.
# python 3.11 + cuda 12.1 + pytorch 2.5.1
conda create -n artdeco python=3.11
conda activate artdeco
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
or
# python 3.12 + cuda 12.8 + pytorch 2.7.1
conda create -n artdeco python=3.12
conda activate artdeco
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
Dependency & Model Preparation
- Download Checkpoints.
Place the required MASt3R and Pi3 Checkpoints in the models/ directory.
# Download MASt3R checkpoints
mkdir -p models/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P models/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_trainingfree.pth -P models/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_codebook.pkl -P models/
- Build VSLAM Module.
# Install VSLAM thirdparty
cd VSLAM
pip install -e thirdparty/mast3r --no-build-isolation
pip install -e thirdparty/in3d --no-build-isolation
pip install -e . --no-build-isolation
# Install Pypose and GeoCalib
pip install pypose
python -m pip install -e "git+https://github.com/cvg/GeoCalib#egg=geocalib"
cd ..
- Build Reconstruct Module.
!!! Please use the
diff_gaussian_rasterizationmodule located in the on-the-fly-nvs submodule. Note that this version is intended strictly for optimizing Gaussians, not for rendering.
# Install gsplat
pip install gsplat
# Install submodules
cd Reconstruct
pip install submodules/diff_gaussian_rasterization --no-build-isolation
pip install submodules/fused-ssim --no-build-isolation
pip install submodules/simple-knn --no-build-isolation
pip install submodules/graphdecoviewer
cd ..
- Install the remaining dependencies.
🚀 Quick Start (Training)
We provide the PINGPONG dataset as a benchmark example.
Data Structure
Organize your dataset as follows:
<dataset_root>
└── pingpong
├── images
│ └── ${timestamp}.png/.jpg
└── (intr.yaml)
The reference intr.yaml is shown below:
width: 2592
height: 1944
# fx, fy, cx, cy ...
calibration: [1478.95393660578, 1478.95393660578, 1296.0, 972.0]
Run Reconstruction
Execute the following command to start the on-the-fly reconstruction:
bash run.sh
✉️ Contact & Citation
For questions, please contact Kerui Ren (renkerui@sjtu.edu.cn).
If you find ARTDECO helpful for your research, please cite our work:
@article{li2025artdeco,
title={Artdeco: Towards efficient and high-fidelity on-the-fly 3d reconstruction with structured scene representation},
author={Li, Guanghao and Ren, Kerui and Xu, Linning and Zheng, Zhewen and Jiang, Changjian and Gao, Xin and Dai, Bo and Pu, Jian and Yu, Mulin and Pang, Jiangmiao},
journal={arXiv preprint arXiv:2510.08551},
year={2025}
}
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
