Sap3d

[CVPR 2024 Hightlight] Code release for "The More You See in 2D, the More You Perceive in 3D"

Generate Convert Improve

Install / Use

/learn @sap3d/Sap3d

About this skill

Quality Score

0/100

README

SAP3D: The More You See in 2D, the More You Perceive in 3D

We present SAP3D, which reconstructs the 3D shape and texture of an object with a variable number of real input images. The quality of 3D shape and texture improve with more views

The More You See in 2D, the More You Perceive in 3D
Xinyang Han*1, Zelin Gao*2, Angjoo Kanazawa1, Shubham Goel†3, Yossi Gandelsman†1 1 UC Berkeley, 2 Zhejiang University, 3 Avataar
CVPR 2024 (Highlight)

project page | arxiv | bibtex

Installation

See installation instructions.

Dataset Preparation

See Preparing Datasets for SAP3D.

Method Overview

Overview of SAP3D. We first compute coarse relative camera poses using an off-the-shelf model. We fine-tune a view-conditioned 2D diffusion model on the input images and simultaneously refine the camera poses via optimization. The resulting instance-specific diffusion model and camera poses enable 3D reconstruction and novel view synthesis from an arbitrary number of input images.

Pipeline

This pipeline encompasses 3 stages for pose estimation and reconstruction:

Pose Estimation Initialization: We use scaled-up RelposePP to initialize the poses for the input images.
Pose Refinement and Diffusion Model TTT: Enhancing the pose estimation with refinement and personalizing diffusion model.
3D Reconstruction: Reconstruct the 3D object based on estimated poses and finetuned diffusion model.

System Requirements

Memory Considerations: To ensure a smooth operation, your system should have at least 38GB of available memory.

Initial Setup

Configuring the Working Directory: Please set your ROOT_DIR as environment variable before launching the pipeline using command like echo 'export ROOT_DIR=Your_ROOT_DIR' >> ~/.bashrc.

Reconstruction and Evaluation

Reconstructing Individual Objects: To process a specific object, kindly use the command below:

sh run_pipeline.sh GSO_demo OBJECT_NAME INPUT_VIEWS GPU_INDEX

For instance:

sh run_pipeline.sh GSO_demo Crosley_Alarm_Clock_Vintage_Metal 5 0

Batch Processing: To execute the pipeline for all examples in the dataset/data/train/GSO_demo directory, please run:

python run_pipeline.py --object_type GSO_demo

Results and Numbers

Our process yields comprehensive data sets, stored and accessible as follows:

2D NVS Outputs: Accessible in the directory camerabooth/experiments_nvs/GSO_demo.
3D NVS Outputs: Found within folders named similarly to 3D_Recon/threestudio/experiments_GSO_demo_view_5_nerf.
Evaluation Metrics: Quantitative results are comprehensively stored in the results folder.

In our commitment to replicability and transparency, we have assembled a detailed repository of results for all test objects within results_standard/GSO_demo. Recognizing the considerable computational demand required (8 A100 GPUs across 1-2 days), we pragmatically suggest the processing of a selective subset of the data. This approach is designed to both confirm your system’s configuration and permit a meaningful, comparative analysis of the results.

To generate the tables for better visualize the numbers for different settings, run:

python results_standard/run/summarize.py

Gradio Demo

For using gradio interface to easily reconstruct in the wild objects, you could run gradio demo/sap3d/app.py. (This would take up to an hour to get the results)

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@inproceedings{han2024more,
  title={The More You See in 2D the More You Perceive in 3D},
  author={Han, Xinyang and Gao, Zelin and Kanazawa, Angjoo and Goel, Shubham and Gandelsman, Yossi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={20912--20922},
  year={2024}
}

Related Skills

node-connect

347.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。