3DObjectReconstruction

3D Object Reconstruction project is a workflow that takes a set of stereo images and camera info and outputs a textured mesh (i.e., .OBJ file). The purpose is to translate physical items into the digital world in a photorealistic way

Generate Convert Improve

Install / Use

/learn @NVIDIA/3DObjectReconstruction

About this skill

Quality Score

0/100

README

NVIDIA 3D Object Reconstruction Workflow

Deploy this example to create a 3D object reconstruction workflow that transforms stereo video input into high-quality 3D assets using state-of-the-art computer vision and neural rendering techniques.

NVIDIA's 3D Object Reconstruction workflow represents a significant advancement in automated 3D asset creation. Real-world tests have demonstrated the ability to generate production-ready 3D meshes with photorealistic textures in under 30 minutes, enabling rapid digital twin creation and synthetic data generation workflows.

The purpose of this workflow is:

To provide a reference implementation of 3D object reconstruction using NVIDIA's AI stack.
To accelerate adoption of 3D AI workflows in computer vision, robotics, and synthetic data generation.

You can get started quickly and achieve high-quality results using your own stereo data by following the Quickstart guide.

NVIDIA 3D Object Reconstruction Workflow

What is 3D Object Reconstruction?

3D Object Reconstruction is the process of creating complete three-dimensional digital representations of real-world objects from 2D image sequences. This example implements a state-of-the-art workflow that combines stereo vision, object segmentation, bundle adjustment, and neural implicit surface reconstruction to produce high-quality 3D meshes with photorealistic textures.

The reconstruction workflow processes stereo image pairs through four main stages: depth estimation using transformer-based FoundationStereo, object segmentation with SAM2, camera pose tracking via BundleSDF, and neural implicit surface reconstruction using NeRF. The result is production-ready 3D assets compatible with Isaac Sim, Omniverse, and game engines.

The workflow comprises of the following components with different tasks:

FoundationStereo: Transformer-based stereo depth estimation with sub-pixel accuracy
SAM2: Video object segmentation for consistent mask generation
Pose Estimation: CUDA-accelerated pose estimation and optimization
Neural SDF: GPU-optimized neural implicit surface reconstruction
RoMa: Robust feature matching for correspondence establishment

How to Use This Workflow

This reference implementation demonstrates proven techniques for high-quality 3D reconstruction. Key capabilities include:

Direct stereo video processing without extensive preprocessing
Automated camera pose estimation and bundle adjustment
Neural implicit surface representation for smooth geometry
Photorealistic texture generation through view synthesis

To effectively use this workflow:

Learn from the reference implementation
- Deploy the stack: Follow the Docker Compose setup to experience the complete workflow
- Study the notebook: Walk through the interactive Jupyter tutorial for hands-on learning
- Understand the architecture: Review the code to see how FoundationStereo, SAM2, and BundleSDF integrate
Prepare your stereo data
- Capture stereo sequences: Record synchronized left/right camera pairs of your target objects
- Calibrate cameras: Ensure accurate intrinsic and extrinsic camera parameters
- Organize data: Structure input files according to the expected schema
Choose your deployment method
- Interactive development: Use Jupyter notebooks for experimentation and parameter tuning
- Automated processing: Deploy CLI tools for batch processing workflows
- Production deployment: Scale using Docker containers in cloud or edge environments
Run reconstruction
- Load stereo data and run the complete workflow end-to-end
- Monitor processing stages and adjust parameters as needed
- Export results as textured OBJ meshes for downstream applications
Integrate results
- Import meshes into Isaac Sim for robotics simulation
- Use assets in Omniverse for collaborative 3D workflows
- Generate synthetic training data for computer vision models

Preparing your data

The workflow processes stereo image sequences as the primary input. The system expects synchronized left and right camera views with known calibration parameters.

1 – Input data schema

Input data should be organized in the following structure:

data/
├── left/           # Left camera images
│   ├── left000000.png
│   ├── left000001.png
│   └── ...
├── right/          # Right camera images
│   ├── right000000.png
│   ├── right000001.png
│   └── ...

Each image pair must be:

| Requirement | Description | |------------|-------------| | Synchronized | Left and right images captured simultaneously | | Calibrated | Known intrinsic parameters and baseline | | Sequential | Numbered frames showing object from multiple viewpoints | | Sufficient overlap | 60-80% overlap between consecutive viewpoints |

2 – Camera calibration

Camera parameters are specified in the configuration file:

# data/configs/base.yaml
camera_config:
  intrinsic: [fx, 0, cx, 0, fy, cy, 0, 0, 1]  # 3x3 camera matrix
foundation_stereo:
  baseline: 0.065  # Stereo baseline in meters
  intrinsic: [fx, 0, cx, 0, fy, cy, 0, 0, 1]

3 – Data organization

For best results, capture data following these guidelines:

Object isolation: Single object against contrasting background
Multiple viewpoints: 360-degree coverage with 15-30 degree increments
Consistent lighting: Avoid shadows and specular reflections
Sharp images: Minimize motion blur and depth of field effects
Texture variety: Include surfaces with visual features for tracking

Real-World Results and What to Expect

The NVIDIA 3D Object Reconstruction workflow represents a reference implementation of advanced neural rendering techniques. Real-world deployments have demonstrated:

High geometric accuracy: Sub-millimeter precision for objects 10-50cm in size
Photorealistic textures: 2048x2048 UV-mapped texture generation
Fast processing: Complete Reconstruction in approximately 30 minutes using an RTX A6000.
Production integration: Direct compatibility with USD, OBJ, and game engine formats

The included retail item example demonstrates reconstruction of a consumer product with complex geometry and varied surface materials, achieving high-quality results suitable for product visualization and synthetic data generation.

Additional Reading

Learn more about the underlying technologies:

Quick Start (Recommended)

Get up and running in under 30 minutes with our Docker Compose deployment:

Prerequisites

Before getting started, ensure your system meets the minimum system requirements.

🎬 Complete Setup

# Clone the repository
git clone <repository-url>
cd 3d-object-reconstruction-github

# One-command setup: downloads weights, builds image
./deploy/compose/deploy.sh setup

# Start the container and launch the jupyter notebook server
./deploy/compose/deploy.sh start

That's it! 🎉 Your Jupyter notebook will be available at http://localhost:8888, it dynamically increases port if 8888 is unavailable.

🎯 Interactive Experience

Once your container is running:

Open Jupyter: Navigate to http://localhost:8888 in your browser
Start the Demo: Open notebooks/3d_object_reconstruction_demo.ipynb
Follow the Guide: Interactive step-by-step reconstruction workflow
Create 3D Assets: Complete workflow from stereo images to textured meshes

Technical Details

Software Components

The workflow consists of the following implemented

Related Skills

node-connect

337.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

337.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.2k

Commit, push, and open a PR