ReconViaGen
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
Install / Use
/learn @GAP-LAB-CUHK-SZ/ReconViaGenREADME
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
<!-- <p align="center"> <a title="Website" href="https://jiahao620.github.io/reconviagen/" target="_blank" rel="noopener noreferrer" style="display: inline-block;"> <img src="https://www.obukhov.ai/img/badges/badge-website.svg"> <img width="1920" height="1080" alt="videoframe_1023" src="https://github.com/user-attachments/assets/08af0af5-5b83-477f-9d4b-a895488befbb" /> </a> </p> --> <div align="center"> </div>
Alpha Demo: https://huggingface.co/spaces/Stable-X/ReconViaGen. We welcome feedback on failure cases to help improve the model.
🆕 News — We've fixed some bugs in the v0.5 branch; please resynchronize with the remote repository.
🆕 News (v0.5) — Releasing the inference code of ReconViaGen-v0.5 in the v0.5 branch of this repository! Thanks for the excellent work TRELLIS.2! We have proposed an effective multi-view fusion strategy for TRELLIS.2, and then we combine ReconViaGen with TRELLIS.2 to enable the generation of high-resolution meshes and PBR materials. For details, please refer to the v0.5 branch of this repository.
<div align="center">Demo of ReconViaGen-v0.5
</div>News (v0.2) — Releasing the training and inference code of ReconViaGen-v0.2 in the main branch of this repository! We have optimized the inference process. Reconstructing 16 images using ReconViaGen without refinement (app.py) consumes less than 18GB of VRAM. Reconstructing 16 images using ReconViaGen (app_fine.py) consumes less than 24GB of VRAM.
News (Community) — An unofficial implementation of ReconViaGen is released! Thanks to estheryang11 a lot!
Installation
Clone the repo:
git clone --recursive https://github.com/GAP-LAB-CUHK-SZ/ReconViaGen.git
cd ReconViaGen
Create a new conda environment named reconviagen and install the dependencies (pytorch 2.4.0 with CUDA 12.1):
. ./setup.sh --new-env --basic --xformers --flash-attn --spconv --mipgaussian --kaolin --nvdiffrast --demo
Local Demo 🤗
Run the script to reconstruct the object without refinement by:
python app.py
Run the script to reconstruct the object with refinement by:
python app_fine.py
Training
0. Data Preparation
The processed dataset can be download here. The dataset is organized as follows:
ProObjaverse-300K/
├── renders_random_env/
│ ├── shard-0000/
│ │ ├── {uid}.tar # per-object archive
│ │ │ ├── {uid}/000.json # camera metadata (extrinsic 4×4, intrinsic 3×3)
│ │ │ ├── {uid}/000.rgba.webp # RGBA render, 1024×1024
│ │ │ ├── {uid}/001.json
│ │ │ ├── {uid}/001.rgba.webp
│ │ │ └── ... # up to ~80 views per object
│ │ └── ...
│ ├── shard-0001/
│ └── ...
└── lh-slats/
├── shard-0000/
│ ├── {uid}.npz # structured latent for the object
│ │ ├── feats: float32 (N, 8) # latent features per voxel
│ │ └── coords: uint8 (N, 3) # voxel coordinates in [0, 63]
│ └── ...
├── shard-0001/
└── ...
Each .tar contains all rendered views for one object. The uid is shared between the render tar and the slat npz, and is used to pair them at training time. The .json camera file contains all camera pose of rendered views.
1. Training DiT of SS Stage.
Run the following code to train the flow model of SS Stage on the ProObjaverse-300K dataset:
. ./train_ss.sh
Noted that we trained the model with 8 A100 GPUs (80GB).
2. Training DiT of SLat Stage.
Run the following code to train the flow model of SLat Stage on the ProObjaverse-300K dataset:
. ./train_slat.sh
Noted that we trained the model with 8 A100 GPUs (80GB).
3. Try the checkpoint with gradio:
Run the following code to try your trained checkpoints with gradio:
python app_try.py --ss_ckpt /path_to_your_trained_ss_ckpt --slat_ckpt /path_to_your_trained_slat_ckpt
Citation
@article{chang2025reconviagen,
title={ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation},
author={Chang, Jiahao and Ye, Chongjie and Wu, Yushuang and Chen, Yuantao and Zhang, Yidan and Luo, Zhongjin and Li, Chenghong and Zhi, Yihao and Han, Xiaoguang},
journal={arXiv preprint arXiv:2510.23306},
year={2025}
}
Related Skills
node-connect
336.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
336.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.9kCommit, push, and open a PR

