SkillAgentSearch skills...

3DObjectReconstruction

3D Object Reconstruction project is a workflow that takes a set of stereo images and camera info and outputs a textured mesh (i.e., .OBJ file). The purpose is to translate physical items into the digital world in a photorealistic way

Install / Use

/learn @NVIDIA/3DObjectReconstruction
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

NVIDIA 3D Object Reconstruction Workflow

Deploy this example to create a 3D object reconstruction workflow that transforms stereo video input into high-quality 3D assets using state-of-the-art computer vision and neural rendering techniques.

NVIDIA's 3D Object Reconstruction workflow represents a significant advancement in automated 3D asset creation. Real-world tests have demonstrated the ability to generate production-ready 3D meshes with photorealistic textures in under 30 minutes, enabling rapid digital twin creation and synthetic data generation workflows.

The purpose of this workflow is:

  1. To provide a reference implementation of 3D object reconstruction using NVIDIA's AI stack.
  2. To accelerate adoption of 3D AI workflows in computer vision, robotics, and synthetic data generation.

You can get started quickly and achieve high-quality results using your own stereo data by following the Quickstart guide.

What is 3D Object Reconstruction?

3D Object Reconstruction is the process of creating complete three-dimensional digital representations of real-world objects from 2D image sequences. This example implements a state-of-the-art workflow that combines stereo vision, object segmentation, bundle adjustment, and neural implicit surface reconstruction to produce high-quality 3D meshes with photorealistic textures.

<div align="center"> <img src="data/docs/pipeline_overview.png" alt="Workflow Overview" title="3D Object Reconstruction Workflow"> </div>

The reconstruction workflow processes stereo image pairs through four main stages: depth estimation using transformer-based FoundationStereo, object segmentation with SAM2, camera pose tracking via BundleSDF, and neural implicit surface reconstruction using NeRF. The result is production-ready 3D assets compatible with Isaac Sim, Omniverse, and game engines.

The workflow comprises of the following components with different tasks:

  • FoundationStereo: Transformer-based stereo depth estimation with sub-pixel accuracy
  • SAM2: Video object segmentation for consistent mask generation
  • Pose Estimation: CUDA-accelerated pose estimation and optimization
  • Neural SDF: GPU-optimized neural implicit surface reconstruction
  • RoMa: Robust feature matching for correspondence establishment

How to Use This Workflow

This reference implementation demonstrates proven techniques for high-quality 3D reconstruction. Key capabilities include:

  • Direct stereo video processing without extensive preprocessing
  • Automated camera pose estimation and bundle adjustment
  • Neural implicit surface representation for smooth geometry
  • Photorealistic texture generation through view synthesis

To effectively use this workflow:

  1. Learn from the reference implementation

    • Deploy the stack: Follow the Docker Compose setup to experience the complete workflow
    • Study the notebook: Walk through the interactive Jupyter tutorial for hands-on learning
    • Understand the architecture: Review the code to see how FoundationStereo, SAM2, and BundleSDF integrate
  2. Prepare your stereo data

    • Capture stereo sequences: Record synchronized left/right camera pairs of your target objects
    • Calibrate cameras: Ensure accurate intrinsic and extrinsic camera parameters
    • Organize data: Structure input files according to the expected schema
  3. Choose your deployment method

    • Interactive development: Use Jupyter notebooks for experimentation and parameter tuning
    • Automated processing: Deploy CLI tools for batch processing workflows
    • Production deployment: Scale using Docker containers in cloud or edge environments
  4. Run reconstruction

    • Load stereo data and run the complete workflow end-to-end
    • Monitor processing stages and adjust parameters as needed
    • Export results as textured OBJ meshes for downstream applications
  5. Integrate results

    • Import meshes into Isaac Sim for robotics simulation
    • Use assets in Omniverse for collaborative 3D workflows
    • Generate synthetic training data for computer vision models

Preparing your data

The workflow processes stereo image sequences as the primary input. The system expects synchronized left and right camera views with known calibration parameters.

1 – Input data schema

Input data should be organized in the following structure:

data/
├── left/           # Left camera images
│   ├── left000000.png
│   ├── left000001.png
│   └── ...
├── right/          # Right camera images
│   ├── right000000.png
│   ├── right000001.png
│   └── ...

Each image pair must be:

| Requirement | Description | |------------|-------------| | Synchronized | Left and right images captured simultaneously | | Calibrated | Known intrinsic parameters and baseline | | Sequential | Numbered frames showing object from multiple viewpoints | | Sufficient overlap | 60-80% overlap between consecutive viewpoints |

2 – Camera calibration

Camera parameters are specified in the configuration file:

# data/configs/base.yaml
camera_config:
  intrinsic: [fx, 0, cx, 0, fy, cy, 0, 0, 1]  # 3x3 camera matrix
foundation_stereo:
  baseline: 0.065  # Stereo baseline in meters
  intrinsic: [fx, 0, cx, 0, fy, cy, 0, 0, 1]

3 – Data organization

For best results, capture data following these guidelines:

  • Object isolation: Single object against contrasting background
  • Multiple viewpoints: 360-degree coverage with 15-30 degree increments
  • Consistent lighting: Avoid shadows and specular reflections
  • Sharp images: Minimize motion blur and depth of field effects
  • Texture variety: Include surfaces with visual features for tracking

Real-World Results and What to Expect

The NVIDIA 3D Object Reconstruction workflow represents a reference implementation of advanced neural rendering techniques. Real-world deployments have demonstrated:

  • High geometric accuracy: Sub-millimeter precision for objects 10-50cm in size
  • Photorealistic textures: 2048x2048 UV-mapped texture generation
  • Fast processing: Complete Reconstruction in approximately 30 minutes using an RTX A6000.
  • Production integration: Direct compatibility with USD, OBJ, and game engine formats

The included retail item example demonstrates reconstruction of a consumer product with complex geometry and varied surface materials, achieving high-quality results suitable for product visualization and synthetic data generation.

Additional Reading

Learn more about the underlying technologies:

Quick Start (Recommended)

Get up and running in under 30 minutes with our Docker Compose deployment:

Prerequisites

Before getting started, ensure your system meets the minimum system requirements.

🎬 Complete Setup

# Clone the repository
git clone <repository-url>
cd 3d-object-reconstruction-github

# One-command setup: downloads weights, builds image
./deploy/compose/deploy.sh setup

# Start the container and launch the jupyter notebook server
./deploy/compose/deploy.sh start

That's it! 🎉 Your Jupyter notebook will be available at http://localhost:8888, it dynamically increases port if 8888 is unavailable.

🎯 Interactive Experience

Once your container is running:

  1. Open Jupyter: Navigate to http://localhost:8888 in your browser
  2. Start the Demo: Open notebooks/3d_object_reconstruction_demo.ipynb
  3. Follow the Guide: Interactive step-by-step reconstruction workflow
  4. Create 3D Assets: Complete workflow from stereo images to textured meshes

Technical Details

Software Components

The workflow consists of the following implemented

Related Skills

View on GitHub
GitHub Stars160
CategoryDevelopment
Updated10h ago
Forks12

Languages

Python

Security Score

80/100

Audited on Mar 26, 2026

No findings