3DObjectReconstruction
3D Object Reconstruction project is a workflow that takes a set of stereo images and camera info and outputs a textured mesh (i.e., .OBJ file). The purpose is to translate physical items into the digital world in a photorealistic way
Install / Use
/learn @NVIDIA/3DObjectReconstructionREADME
NVIDIA 3D Object Reconstruction Workflow
Deploy this example to create a 3D object reconstruction workflow that transforms stereo video input into high-quality 3D assets using state-of-the-art computer vision and neural rendering techniques.
NVIDIA's 3D Object Reconstruction workflow represents a significant advancement in automated 3D asset creation. Real-world tests have demonstrated the ability to generate production-ready 3D meshes with photorealistic textures in under 30 minutes, enabling rapid digital twin creation and synthetic data generation workflows.
The purpose of this workflow is:
- To provide a reference implementation of 3D object reconstruction using NVIDIA's AI stack.
- To accelerate adoption of 3D AI workflows in computer vision, robotics, and synthetic data generation.
You can get started quickly and achieve high-quality results using your own stereo data by following the Quickstart guide.
- NVIDIA 3D Object Reconstruction Workflow
What is 3D Object Reconstruction?
3D Object Reconstruction is the process of creating complete three-dimensional digital representations of real-world objects from 2D image sequences. This example implements a state-of-the-art workflow that combines stereo vision, object segmentation, bundle adjustment, and neural implicit surface reconstruction to produce high-quality 3D meshes with photorealistic textures.
<div align="center"> <img src="data/docs/pipeline_overview.png" alt="Workflow Overview" title="3D Object Reconstruction Workflow"> </div>The reconstruction workflow processes stereo image pairs through four main stages: depth estimation using transformer-based FoundationStereo, object segmentation with SAM2, camera pose tracking via BundleSDF, and neural implicit surface reconstruction using NeRF. The result is production-ready 3D assets compatible with Isaac Sim, Omniverse, and game engines.
The workflow comprises of the following components with different tasks:
- FoundationStereo: Transformer-based stereo depth estimation with sub-pixel accuracy
- SAM2: Video object segmentation for consistent mask generation
- Pose Estimation: CUDA-accelerated pose estimation and optimization
- Neural SDF: GPU-optimized neural implicit surface reconstruction
- RoMa: Robust feature matching for correspondence establishment
How to Use This Workflow
This reference implementation demonstrates proven techniques for high-quality 3D reconstruction. Key capabilities include:
- Direct stereo video processing without extensive preprocessing
- Automated camera pose estimation and bundle adjustment
- Neural implicit surface representation for smooth geometry
- Photorealistic texture generation through view synthesis
To effectively use this workflow:
-
Learn from the reference implementation
- Deploy the stack: Follow the Docker Compose setup to experience the complete workflow
- Study the notebook: Walk through the interactive Jupyter tutorial for hands-on learning
- Understand the architecture: Review the code to see how FoundationStereo, SAM2, and BundleSDF integrate
-
Prepare your stereo data
- Capture stereo sequences: Record synchronized left/right camera pairs of your target objects
- Calibrate cameras: Ensure accurate intrinsic and extrinsic camera parameters
- Organize data: Structure input files according to the expected schema
-
Choose your deployment method
- Interactive development: Use Jupyter notebooks for experimentation and parameter tuning
- Automated processing: Deploy CLI tools for batch processing workflows
- Production deployment: Scale using Docker containers in cloud or edge environments
-
Run reconstruction
- Load stereo data and run the complete workflow end-to-end
- Monitor processing stages and adjust parameters as needed
- Export results as textured OBJ meshes for downstream applications
-
Integrate results
- Import meshes into Isaac Sim for robotics simulation
- Use assets in Omniverse for collaborative 3D workflows
- Generate synthetic training data for computer vision models
Preparing your data
The workflow processes stereo image sequences as the primary input. The system expects synchronized left and right camera views with known calibration parameters.
1 – Input data schema
Input data should be organized in the following structure:
data/
├── left/ # Left camera images
│ ├── left000000.png
│ ├── left000001.png
│ └── ...
├── right/ # Right camera images
│ ├── right000000.png
│ ├── right000001.png
│ └── ...
Each image pair must be:
| Requirement | Description | |------------|-------------| | Synchronized | Left and right images captured simultaneously | | Calibrated | Known intrinsic parameters and baseline | | Sequential | Numbered frames showing object from multiple viewpoints | | Sufficient overlap | 60-80% overlap between consecutive viewpoints |
2 – Camera calibration
Camera parameters are specified in the configuration file:
# data/configs/base.yaml
camera_config:
intrinsic: [fx, 0, cx, 0, fy, cy, 0, 0, 1] # 3x3 camera matrix
foundation_stereo:
baseline: 0.065 # Stereo baseline in meters
intrinsic: [fx, 0, cx, 0, fy, cy, 0, 0, 1]
3 – Data organization
For best results, capture data following these guidelines:
- Object isolation: Single object against contrasting background
- Multiple viewpoints: 360-degree coverage with 15-30 degree increments
- Consistent lighting: Avoid shadows and specular reflections
- Sharp images: Minimize motion blur and depth of field effects
- Texture variety: Include surfaces with visual features for tracking
Real-World Results and What to Expect
The NVIDIA 3D Object Reconstruction workflow represents a reference implementation of advanced neural rendering techniques. Real-world deployments have demonstrated:
- High geometric accuracy: Sub-millimeter precision for objects 10-50cm in size
- Photorealistic textures: 2048x2048 UV-mapped texture generation
- Fast processing: Complete Reconstruction in approximately 30 minutes using an RTX A6000.
- Production integration: Direct compatibility with USD, OBJ, and game engine formats
The included retail item example demonstrates reconstruction of a consumer product with complex geometry and varied surface materials, achieving high-quality results suitable for product visualization and synthetic data generation.
Additional Reading
Learn more about the underlying technologies:
- FoundationStereo: Transformer-based Stereo Depth Estimation
- SAM 2: Segment Anything in Images and Videos
- RoMa: Robust Dense Feature Matching
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
Quick Start (Recommended)
Get up and running in under 30 minutes with our Docker Compose deployment:
Prerequisites
Before getting started, ensure your system meets the minimum system requirements.
🎬 Complete Setup
# Clone the repository
git clone <repository-url>
cd 3d-object-reconstruction-github
# One-command setup: downloads weights, builds image
./deploy/compose/deploy.sh setup
# Start the container and launch the jupyter notebook server
./deploy/compose/deploy.sh start
That's it! 🎉 Your Jupyter notebook will be available at http://localhost:8888, it dynamically increases port if 8888 is unavailable.
🎯 Interactive Experience
Once your container is running:
- Open Jupyter: Navigate to
http://localhost:8888in your browser - Start the Demo: Open
notebooks/3d_object_reconstruction_demo.ipynb - Follow the Guide: Interactive step-by-step reconstruction workflow
- Create 3D Assets: Complete workflow from stereo images to textured meshes
Technical Details
Software Components
The workflow consists of the following implemented
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
