ReconViaGen

ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation

Generate Convert Improve

Install / Use

/learn @GAP-LAB-CUHK-SZ/ReconViaGen

About this skill

Quality Score

0/100

README

ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation

<div align="center">

</div>

teaser

Alpha Demo: https://huggingface.co/spaces/Stable-X/ReconViaGen. We welcome feedback on failure cases to help improve the model.

🆕 News — We've fixed some bugs in the v0.5 branch; please resynchronize with the remote repository.

🆕 News (v0.5) — Releasing the inference code of ReconViaGen-v0.5 in the v0.5 branch of this repository! Thanks for the excellent work TRELLIS.2! We have proposed an effective multi-view fusion strategy for TRELLIS.2, and then we combine ReconViaGen with TRELLIS.2 to enable the generation of high-resolution meshes and PBR materials. For details, please refer to the v0.5 branch of this repository.

Demo of ReconViaGen-v0.5

</div>

News (v0.2) — Releasing the training and inference code of ReconViaGen-v0.2 in the main branch of this repository! We have optimized the inference process. Reconstructing 16 images using ReconViaGen without refinement (app.py) consumes less than 18GB of VRAM. Reconstructing 16 images using ReconViaGen (app_fine.py) consumes less than 24GB of VRAM.

News (Community) — An unofficial implementation of ReconViaGen is released! Thanks to estheryang11 a lot!

Installation

Clone the repo:

git clone --recursive https://github.com/GAP-LAB-CUHK-SZ/ReconViaGen.git
cd ReconViaGen

Create a new conda environment named reconviagen and install the dependencies (pytorch 2.4.0 with CUDA 12.1):

. ./setup.sh --new-env --basic --xformers --flash-attn --spconv --mipgaussian --kaolin --nvdiffrast --demo

Local Demo 🤗

Run the script to reconstruct the object without refinement by:

python app.py

Run the script to reconstruct the object with refinement by:

python app_fine.py

Training

0. Data Preparation

The processed dataset can be download here. The dataset is organized as follows:

ProObjaverse-300K/
├── renders_random_env/
│   ├── shard-0000/
│   │   ├── {uid}.tar          # per-object archive
│   │   │   ├── {uid}/000.json          # camera metadata (extrinsic 4×4, intrinsic 3×3)
│   │   │   ├── {uid}/000.rgba.webp     # RGBA render, 1024×1024
│   │   │   ├── {uid}/001.json
│   │   │   ├── {uid}/001.rgba.webp
│   │   │   └── ...                     # up to ~80 views per object
│   │   └── ...
│   ├── shard-0001/
│   └── ...
└── lh-slats/
    ├── shard-0000/
    │   ├── {uid}.npz          # structured latent for the object
    │   │   ├── feats:  float32 (N, 8)       # latent features per voxel
    │   │   └── coords: uint8   (N, 3)        # voxel coordinates in [0, 63]
    │   └── ...
    ├── shard-0001/
    └── ...

Each .tar contains all rendered views for one object. The uid is shared between the render tar and the slat npz, and is used to pair them at training time. The .json camera file contains all camera pose of rendered views.

1. Training DiT of SS Stage.

Run the following code to train the flow model of SS Stage on the ProObjaverse-300K dataset:

. ./train_ss.sh

Noted that we trained the model with 8 A100 GPUs (80GB).

2. Training DiT of SLat Stage.

Run the following code to train the flow model of SLat Stage on the ProObjaverse-300K dataset:

. ./train_slat.sh

Noted that we trained the model with 8 A100 GPUs (80GB).

3. Try the checkpoint with gradio:

Run the following code to try your trained checkpoints with gradio:

python app_try.py --ss_ckpt /path_to_your_trained_ss_ckpt --slat_ckpt /path_to_your_trained_slat_ckpt

Citation

@article{chang2025reconviagen,
        title={ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation},
        author={Chang, Jiahao and Ye, Chongjie and Wu, Yushuang and Chen, Yuantao and Zhang, Yidan and Luo, Zhongjin and Li, Chenghong and Zhi, Yihao and Han, Xiaoguang},
        journal={arXiv preprint arXiv:2510.23306},
        year={2025}
}

Related Skills

node-connect

336.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

336.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.9k

Commit, push, and open a PR