Simplerecon
[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions
Install / Use
/learn @nianticlabs/SimplereconREADME
SimpleRecon: 3D Reconstruction Without 3D Convolutions
This is the reference PyTorch implementation for training and testing MVS depth estimation models using the method described in
<p align="center"> <img src="media/teaser.jpeg" alt="example output" width="720" /> </p>SimpleRecon: 3D Reconstruction Without 3D Convolutions
Mohamed Sayed, John Gibson, Jamie Watson, Victor Adrian Prisacariu, Michael Firman, and Clément Godard
Paper, ECCV 2022 (arXiv pdf), Supplemental Material, Project Page, Video
https://github.com/nianticlabs/simplerecon/assets/14994206/ae5074c2-6537-45f1-9f5e-0b3646a96dcb
https://user-images.githubusercontent.com/14994206/189788536-5fa8a1b5-ae8b-4f64-92d6-1ff1abb03eaf.mp4
This code is for non-commercial use; please see the license file for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below and link this repo. Thanks!
🆕 Updates
25/05/2023: Fixed package verions for llvm-openmp, clang, and protobuf. Do use this new environment file if you have trouble running the code and/or if dataloading is being limited to a single thread.
09/03/2023: Added kornia version to the environments file to fix kornia typing issue. (thanks @natesimon!)
26/01/2023: The license has been modified to make running the model for academic reasons easier. Please the LICENSE file for the exact details.
There is an update as of 31/12/2022 that fixes slightly wrong intrinsics, flip augmentation for the cost volume, and a numerical precision bug in projection. All scores improve. You will need to update your forks and use new weights. See Bug Fixes.
Precomputed scans for online default frames are here: https://drive.google.com/drive/folders/1dSOFI9GayYHQjsx4I_NG0-3ebCAfWXjV?usp=share_link
Table of Contents
- 🗺️ Overview
- ⚙️ Setup
- 📦 Models
- 🚀 Speed
- 📝 TODOs:
- 🏃 Running out of the box!
- 💾 ScanNetv2 Dataset
- 🖼️🖼️🖼️ Frame Tuples
- 📊 Testing and Evaluation
- 👉☁️ Point Cloud Fusion
- 📊 Mesh Metrics
- ⏳ Training
- 🔧 Other training and testing options
- ✨ Visualization
- 📝🧮👩💻 Notation for Transformation Matrices
- 🗺️ World Coordinate System
- 🐜🔧 Bug Fixes
- 🗺️💾 COLMAP Dataset
- 🙏 Acknowledgements
- 📜 BibTeX
- 👩⚖️ License
🗺️ Overview
SimpleRecon takes as input posed RGB images, and outputs a depth map for a target image.
⚙️ Setup
Assuming a fresh Anaconda distribution, you can install dependencies with:
conda env create -f simplerecon_env.yml
We ran our experiments with PyTorch 1.10, CUDA 11.3, Python 3.9.7 and Debian GNU/Linux 10.
📦 Models
Download a pretrained model into the weights/ folder.
We provide the following models (scores are with online default keyframes):
| --config | Model | Abs Diff↓| Sq Rel↓ | delta < 1.05↑| Chamfer↓ | F-Score↑ |
|-------------|----------|--------------------|---------|---------|--------------|----------|
| hero_model.yaml | Metadata + Resnet Matching | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |
| dot_product_model.yaml | Dot Product + Resnet Matching | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |
hero_model is the one we use in the paper as Ours
🚀 Speed
| --config | Model | Inference Speed (--batch_size 1) | Inference GPU memory | Approximate training time |
|------------|------------|------------|-------------------------|-----------------------------|
| hero_model | Hero, Metadata + Resnet | 130ms / 70ms (speed optimized) | 2.6GB / 5.7GB (speed optimized) | 36 hours |
| dot_product_model | Dot Product + Resnet | 80ms | 2.6GB | 36 hours |
With larger batches speed increases considerably. With batch size 8 on the non-speed optimized model, the latency drops to ~40ms.
📝 TODOs:
- [x] Simple scan for folks to quickly try the code, instead of downloading the ScanNetv2 test scenes. DONE
- [x] ScanNetv2 extraction, ~~ETA 10th October~~ DONE
- [ ] FPN model weights.
- ~~[ ] Tutorial on how to use Scanniverse data, ETA 5th October 10th October 20th October~~ At present there is no publically available way of exporting scans from Scanniverse. You'll have to use ios-logger; NeuralRecon have a good tutorial on this, and a dataloader that accepts the processed format is at
datasets/arkit_dataset.py. UPDATE: There is now a quick readme data_scripts/IOS_LOGGER_ARKIT_README.md for how to process and run inference an ios-logger scan using the script atdata_scripts/ios_logger_preprocessing.py.
🏃 Running out of the box!
We've now included two scans for people to try out immediately with the code. You can download these scans from here.
Steps:
- Download weights for the
hero_modelinto the weights directory. - Download the scans and unzip them to a directory of your choosing.
- Modify the value for the option
dataset_pathinconfigs/data/vdr_dense.yamlto the base path of the unzipped vdr folder. - You should be able to run it! Something like this will work:
CUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \
--output_base_path OUTPUT_PATH \
--config_file configs/models/hero_model.yaml \
--load_weights_from_checkpoint weights/hero_model.ckpt \
--data_config configs/data/vdr_dense.yaml \
--num_workers 8 \
--batch_size 2 \
--fast_cost_volume \
--run_fusion \
--depth_fuser open3d \
--fuse_color \
--dump_depth_visualization;
This will output meshes, quick depth viz, and socres when benchmarked against LiDAR depth under OUTPUT_PATH.
This command uses vdr_dense.yaml which will generate depths for every frame and fuse them into a mesh. In the paper we report scores with fused keyframes instead, and you can run those using vdr_default.yaml. You can also use dense_offline tuples by instead using vdr_dense_offline.yaml.
See the section below on testing and evaluation. Make sure to use the correct config flags for datasets.
💾 ScanNetv2 Dataset
~~Please follow the instructions here to download the dataset. This dataset is quite big (>2TB), so make sure you have enough space, especially for extracting files.~~
~~Once downloaded, use this script to export raw sensor data to images and depth files.~~
We've written a quick tutorial and included modified scripts to help you with downloading and extracting ScanNetv2. You can find them at data_scripts/scannet_wrangling_scripts/
You should change the dataset_path config argument for ScanNetv2 data configs at configs/data/ to match where your dataset is.
The codebase expects ScanNetv2 to be in the following format:
dataset_path
scans_test (test scans)
scene0707
scene0707_00_vh_clean_2.ply (gt mesh)
sensor_data
frame-000261.pose.txt
frame-000261.color.jpg
frame-000261.color.512.png (optional, image at 512x384)
frame-000261.color.640.png (optional, image at 640x480)
frame-000261.depth.png (full res depth, stored scale *1000)
frame-000261.depth.256.png (optional, depth at 256x192 also
scaled)
scene0707.txt (scan metadata and image sizes)
intrinsic
intrinsic_depth.txt
intrinsic_color.txt
...
scans (val and train scans)
scene0000_00
(see above)
scene0000_01
....
In this example scene0707.txt should contain the scan's metadata:
colorHeight = 968
colorToDepthExtrinsics = 0.999263 -0.010031 0.037048 ........
colorWidth = 1296
depthHeight = 480
depthWidth = 640
fx_color = 1170.187988
fx_depth = 570.924255
fy_color = 1170.187988
fy_depth = 570.924316
mx_color = 647.750000
mx_depth = 319.500000
my_color = 483.750000
my_depth = 239.500000
numColorFrames = 784
numDepthFrames = 784
numIMUmeasurements = 1632
frame-000261.pose.txt should contain pose in the form:
-0.384739 0.271466 -0.882203 4.98152
0.921157 0.0521417 -0.385682 1
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
