SkillAgentSearch skills...

Sap3d

[CVPR 2024 Hightlight] Code release for "The More You See in 2D, the More You Perceive in 3D"

Install / Use

/learn @sap3d/Sap3d
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SAP3D: The More You See in 2D, the More You Perceive in 3D

We present SAP3D, which reconstructs the 3D shape and texture of an object with a variable number of real input images. The quality of 3D shape and texture improve with more views

<p align="center"> <img src='docs/teaser.jpg' align="center" style="width: 80%; height: auto;"> </p>

The More You See in 2D, the More You Perceive in 3D
Xinyang Han*<sup>1</sup>, Zelin Gao*<sup>2</sup>, Angjoo Kanazawa<sup>1</sup>, Shubham Goel<sup>†3</sup>, Yossi Gandelsman<sup>†1</sup><br> <sup>1</sup> UC Berkeley, <sup>2</sup> Zhejiang University, <sup>3</sup> Avataar
CVPR 2024 (Highlight)

project page | arxiv | bibtex

<!-- ## Features - SAP3D is the first universal unsupervised image segmentation model to tackle unsupervised semantic-aware instance, semantic, and panoptic segmentation tasks using a unified framework. - SAP3D can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K. - SAP3D exhibits strong robustness to domain shifts when evaluated on 11 benchmarks across domains like natural images, video frames, paintings, sketches, etc. - SAP3D can serve as a pretrained model for fully/semi-supervised detection and segmentation tasks. -->

Installation

See installation instructions.

Dataset Preparation

See Preparing Datasets for SAP3D.

Method Overview

<p align="center"> <img src="docs/approach.jpg" width=90%> </p>

Overview of SAP3D. We first compute coarse relative camera poses using an off-the-shelf model. We fine-tune a view-conditioned 2D diffusion model on the input images and simultaneously refine the camera poses via optimization. The resulting instance-specific diffusion model and camera poses enable 3D reconstruction and novel view synthesis from an arbitrary number of input images.

Pipeline

This pipeline encompasses 3 stages for pose estimation and reconstruction:

  1. Pose Estimation Initialization: We use scaled-up RelposePP to initialize the poses for the input images.
  2. Pose Refinement and Diffusion Model TTT: Enhancing the pose estimation with refinement and personalizing diffusion model.
  3. 3D Reconstruction: Reconstruct the 3D object based on estimated poses and finetuned diffusion model.

System Requirements

  • Memory Considerations: To ensure a smooth operation, your system should have at least 38GB of available memory.

Initial Setup

  • Configuring the Working Directory: Please set your ROOT_DIR as environment variable before launching the pipeline using command like echo 'export ROOT_DIR=Your_ROOT_DIR' >> ~/.bashrc.

Reconstruction and Evaluation

Reconstructing Individual Objects: To process a specific object, kindly use the command below:

sh run_pipeline.sh GSO_demo OBJECT_NAME INPUT_VIEWS GPU_INDEX

For instance:

sh run_pipeline.sh GSO_demo Crosley_Alarm_Clock_Vintage_Metal 5 0

Batch Processing: To execute the pipeline for all examples in the dataset/data/train/GSO_demo directory, please run:

python run_pipeline.py --object_type GSO_demo

Results and Numbers

Our process yields comprehensive data sets, stored and accessible as follows:

  • 2D NVS Outputs: Accessible in the directory camerabooth/experiments_nvs/GSO_demo.
  • 3D NVS Outputs: Found within folders named similarly to 3D_Recon/threestudio/experiments_GSO_demo_view_5_nerf.
  • Evaluation Metrics: Quantitative results are comprehensively stored in the results folder.

In our commitment to replicability and transparency, we have assembled a detailed repository of results for all test objects within results_standard/GSO_demo. Recognizing the considerable computational demand required (8 A100 GPUs across 1-2 days), we pragmatically suggest the processing of a selective subset of the data. This approach is designed to both confirm your system’s configuration and permit a meaningful, comparative analysis of the results.

To generate the tables for better visualize the numbers for different settings, run:

python results_standard/run/summarize.py

Gradio Demo

For using gradio interface to easily reconstruct in the wild objects, you could run gradio demo/sap3d/app.py. (This would take up to an hour to get the results)

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@inproceedings{han2024more,
  title={The More You See in 2D the More You Perceive in 3D},
  author={Han, Xinyang and Gao, Zelin and Kanazawa, Angjoo and Goel, Shubham and Gandelsman, Yossi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={20912--20922},
  year={2024}
}

Related Skills

View on GitHub
GitHub Stars64
CategoryDevelopment
Updated5mo ago
Forks3

Languages

Python

Security Score

77/100

Audited on Oct 13, 2025

No findings