demo

Official implementation of DUSt3R: Geometric 3D Vision Made Easy
[Project page], [DUSt3R arxiv]

Make sure to also check our other works:
Grounding Image Matching in 3D with MASt3R: DUSt3R with a local feature head, metric pointmaps, and a more scalable global alignment!
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors: DUSt3R with known depth / focal length / poses.
MUSt3R: Multi-view Network for Stereo 3D Reconstruction: Multi-view predictions (RGB SLAM/SfM) without any global alignment.

Example of reconstruction from two images

High level overview of DUSt3R capabilities

@inproceedings{dust3r_cvpr24,
      title={DUSt3R: Geometric 3D Vision Made Easy}, 
      author={Shuzhe Wang and Vincent Leroy and Yohann Cabon and Boris Chidlovskii and Jerome Revaud},
      booktitle = {CVPR},
      year = {2024}
}

@misc{dust3r_arxiv23,
      title={DUSt3R: Geometric 3D Vision Made Easy}, 
      author={Shuzhe Wang and Vincent Leroy and Yohann Cabon and Boris Chidlovskii and Jerome Revaud},
      year={2023},
      eprint={2312.14132},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Table of Contents
License
Get Started
Usage
Training

License

The code is distributed under the CC BY-NC-SA 4.0 License. See LICENSE for more information.

# Copyright (C) 2024-present Naver Corporation. All rights reserved.
# Licensed under CC BY-NC-SA 4.0 (non-commercial use only).

Get Started

Installation

Clone DUSt3R.

git clone --recursive https://github.com/naver/dust3r
cd dust3r
# if you have already cloned dust3r:
# git submodule update --init --recursive

Create the environment, here we show an example using conda.

conda create -n dust3r python=3.11 cmake=3.14.0
conda activate dust3r 
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
# Optional: you can also install additional packages to:
# - add support for HEIC images
# - add pyrender, used to render depthmap in some datasets preprocessing
# - add required packages for visloc.py
pip install -r requirements_optional.txt

Optional, compile the cuda kernels for RoPE (as in CroCo v2).

# DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Checkpoints

You can obtain the checkpoints by two ways:

You can use our huggingface_hub integration: the models will be downloaded automatically.
Otherwise, We provide several pre-trained models:

| Modelname | Training resolutions | Head | Encoder | Decoder | |-------------|----------------------|------|---------|---------| | DUSt3R_ViTLarge_BaseDecoder_224_linear.pth | 224x224 | Linear | ViT-L | ViT-B | | DUSt3R_ViTLarge_BaseDecoder_512_linear.pth | 512x384, 512x336, 512x288, 512x256, 512x160 | Linear | ViT-L | ViT-B | | DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth | 512x384, 512x336, 512x288, 512x256, 512x160 | DPT | ViT-L | ViT-B |

You can check the hyperparameters we used to train these models in the section: Our Hyperparameters

To download a specific model, for example DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth:

mkdir -p checkpoints/
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth -P checkpoints/

For the checkpoints, make sure to agree to the license of all the public training datasets and base checkpoints we used, in addition to CC-BY-NC-SA 4.0. Again, see section: Our Hyperparameters for details.

Interactive demo

In this demo, you should be able run DUSt3R on your machine to reconstruct a scene. First select images that depicts the same scene.

You can adjust the global alignment schedule and its number of iterations.

[!NOTE] If you selected one or two images, the global alignment procedure will be skipped (mode=GlobalAlignerMode.PairViewer)

Hit "Run" and wait. When the global alignment ends, the reconstruction appears. Use the slider "min_conf_thr" to show or remove low confidence areas.

python3 demo.py --model_name DUSt3R_ViTLarge_BaseDecoder_512_dpt

# Use --weights to load a checkpoint from a local file, eg --weights checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
# Use --image_size to select the correct resolution for the selected checkpoint. 512 (default) or 224
# Use --local_network to make it accessible on the local network, or --server_name to specify the url manually
# Use --server_port to change the port, by default it will search for an available port starting at 7860
# Use --device to use a different device, by default it's "cuda"

Interactive demo with docker

To run DUSt3R using Docker, including with NVIDIA CUDA support, follow these instructions:

Install Docker: If not already installed, download and install docker and docker compose from the Docker website.
Install NVIDIA Docker Toolkit: For GPU support, install the NVIDIA Docker toolkit from the Nvidia website.
Build the Docker image and run it: cd into the ./docker directory and run the following commands:

cd docker
bash run.sh --with-cuda --model_name="DUSt3R_ViTLarge_BaseDecoder_512_dpt"

Or if you want to run the demo without CUDA support, run the following command:

cd docker
bash run.sh --model_name="DUSt3R_ViTLarge_BaseDecoder_512_dpt"

By default, demo.py is lanched with the option --local_network.
Visit http://localhost:7860/ to access the web UI (or replace localhost with the machine's name to access it from the network).

run.sh will launch docker-compose using either the docker-compose-cuda.yml or docker-compose-cpu.ym config file, then it starts the demo using entrypoint.sh.

demo

Usage

from dust3r.inference import inference
from dust3r.model import AsymmetricCroCo3DStereo
from dust3r.utils.image import load_images
from dust3r.image_pairs import make_pairs
from dust3r.cloud_opt import global_aligner, GlobalAlignerMode

if __name__ == '__main__':
    device = 'cuda'
    batch_size = 1
    schedule = 'cosine'
    lr = 0.01
    niter = 300

    model_name = "naver/DUSt3R_ViTLarge_BaseDecoder_512_dpt"
    # you can put the path to a local checkpoint in model_name if needed
    model = AsymmetricCroCo3DStereo.from_pretrained(model_name).to(device)
    # load_images can take a list of images or a directory
    images = load_images(['croco/assets/Chateau1.png', 'croco/assets/Chateau2.png'], size=512)
    pairs = make_pairs(images, scene_graph='complete', prefilter=None, symmetrize=True)
    output = inference(pairs, model, device, batch_size=batch_size)

    # at this stage, you have the raw dust3r predictions
    view1, pred1 = output['view1'], output['pred1']
    view2, pred2 = output['view2'], output['pred2']
    # here, view1, pred1, view2, pred2 are dicts of lists of len(2)
    #  -> because we symmetrize we have (im1, im2) and (im2, im1) pairs
    # in each view you have:
    # an integer image identifier: view1['idx'] and view2['idx']
    # the img: view1['img'] and view2['img']
    # the image shape: view1['true_shape'] and view2['true_shape']
    # an instance string output by the dataloader: view1['instance'] and view2['instance']
    # pred1 and pred2 contains the confidence values: pred1['conf'] and pred2['conf']
    # pred1 contains 3D points for view1['img'] in view1['img'] space: pred1['pts3d']
    # pred2 contains 3D points for view2['img'] in view1['img'] space: pred2['pts3d_in_other_view']

    # next we'll use the global_aligner to align the predictions
    # depending on your task, you may be fine with the raw output and not need it
    # with only two input images, you could use GlobalAlignerMode.PairViewer: it would just convert the output
    # if using GlobalAlignerMode.PairViewer, no need to run compute_global_alignment
    scene = global_aligner(output, device=device, mode=GlobalAlignerMode.PointCloudOptimizer)
    loss = scene.compute_global_alignment(init="mst", niter=niter, schedule=schedule, lr=lr)

    # retrieve useful values from scene:
    imgs = scene.imgs
    focals = scene.get_focals()
    poses = scene.get_im_poses()
    pts3d = scene.get_pts3d()
    confidence_masks = scene.get_masks()

    # visualize reconstruction
    scene.show()

    # find 2D-2D matches between the two images
    from dust3r.utils.geometry import find_reciprocal_matches, xy_grid
    pts2d_list, pts3d_list = [], []
    for i in range(2):
        conf_i = confidence_masks[i].cpu().numpy()
        pts2d_list.append(xy_grid(*imgs[i].shape[:

Dust3r

Install / Use

README

Table of Contents