SkillAgentSearch skills...

Must3r

MUSt3R: Multi-view Network for Stereo 3D Reconstruction

Install / Use

/learn @naver/Must3r
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

logo

Official implementation of MUSt3R: Multi-view Network for Stereo 3D Reconstruction
[Project page], [MUSt3R arxiv]

examples overview

@inproceedings{must3r_cvpr25,
      title={MUSt3R: Multi-view Network for Stereo 3D Reconstruction}, 
      author={Yohann Cabon and Lucas Stoffl and Leonid Antsfeld and Gabriela Csurka and Boris Chidlovskii and Jerome Revaud and Vincent Leroy},
      booktitle = {CVPR},
      year = {2025}
}

@misc{must3r_arxiv25,
      title={MUSt3R: Multi-view Network for Stereo 3D Reconstruction}, 
      author={Yohann Cabon and Lucas Stoffl and Leonid Antsfeld and Gabriela Csurka and Boris Chidlovskii and Jerome Revaud and Vincent Leroy},
      year={2025},
      eprint={2503.01661},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Table of Contents

License

MUSt3R is released under the MUSt3R Non-Commercial License. See LICENSE and NOTICE for more information.
NOTICE also contains information about the datasets used to train the checkpoints. The mapfree dataset in particular, which was used to train all models, has a very restrictive license.

Get Started

MUSt3R extends the DUSt3R architecture through several modifications: making it symmetric and enabling online predictions of the camera pose and 3D structure of a collection of images by using a multi-layer memory mechanism.

Installation

using setup.py

micromamba create -n must3r python=3.11 cmake=3.14.0
micromamba activate must3r 
pip3 install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126 # use the correct version of cuda for your system

# (recommended) if you can, install xFormers for memory-efficient attention
pip3 install -U xformers==0.0.30 --index-url https://download.pytorch.org/whl/cu126
pip3 install must3r@git+https://github.com/naver/must3r.git
# pip3 install must3r[optional]@git+https://github.com/naver/must3r.git # adds pillow-heif
# pip3 install --no-build-isolation must3r[curope]@git+https://github.com/naver/must3r.git # adds curope
# pip3 install --no-build-isolation must3r[all]@git+https://github.com/naver/must3r.git # adds all optional dependencies

development (no installation)

micromamba create -n must3r python=3.11 cmake=3.14.0
micromamba activate must3r 
pip3 install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126 # use the correct version of cuda for your system

# (recommended) if you can, install xFormers for memory-efficient attention
pip3 install -U xformers==0.0.30 --index-url https://download.pytorch.org/whl/cu126

git clone --recursive https://github.com/naver/must3r.git
cd must3r
# if you have already cloned must3r:
# git submodule update --init --recursive

pip install -r dust3r/requirements.txt
pip install -r dust3r/requirements_optional.txt
pip install -r requirements.txt

# install asmk
pip install faiss-cpu  # or the officially supported way (not tested): micromamba install -c pytorch faiss-cpu=1.11.0  # faiss-gpu=1.11.0 
mkdir build
cd build
git clone https://github.com/jenicek/asmk.git
cd asmk/cython/
cythonize *.pyx
cd ..
pip install .
cd ../..

# Optional step: MUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd dust3r/croco/models/curope/
pip install .
cd ../../../../

Checkpoints

We provide several pre-trained models. For these checkpoints, make sure to agree to the license of all the training datasets we used, in addition to MUSt3R License. For more information, check NOTICE.

| Modelname | Training resolutions | Head | Encoder | Decoder | |-------------|----------------------|------|---------|---------| | MUSt3R_224_cvpr.pth | 224x224 | Linear | ViT-L | ViT-B | | MUSt3R_512_cvpr.pth | 512x384, 512x336, 512x288, 512x256, 512x160 | Linear | ViT-L | ViT-B | | MUSt3R_512.pth | 512x384, 512x336, 512x288, 512x256, 512x160 | Linear | ViT-L | ViT-B |

MUSt3R_224_cvpr and MUSt3R_512_cvpr are the same checkpoints that we evaluated for CVPR.
MUSt3R_512 was finetuned from MUSt3R_512_cvpr with updated hyperparamers (20 views instead of 10, bf16, less token dropout, also see training section) and additional datasets (higher resolution version of ARKitScenes from the depth upsample subset, updated ScanNet++ to v2, added back Virtual Kitti 2, some scenes from Hypersim, and generated scenes with InfiniGen). It outperforms MUSt3R_512_cvpr in most of the evaluations (see the updated evaluations).

We also provide both the trainingfree.pth and codebook.pkl files necessary to run image retrieval. MUSt3R_512_cvpr and MUSt3R_512 share the same encoder so there's only one set of files for both of them.
MUSt3R_224_retrieval_trainingfree.pth
MUSt3R_224_retrieval_codebook.pkl

MUSt3R_512_retrieval_trainingfree.pth
MUSt3R_512_retrieval_codebook.pkl

MD5 checksums (https://download.europe.naverlabs.com/ComputerVision/MUSt3R/checksums.txt):

ac176abd2b2c3bc5f2aea664d82e9ffa  MUSt3R_224_cvpr.pth
2a82597c3317efac40657d4f881c71f0  MUSt3R_224_retrieval_trainingfree.pth
e675ec36c7c40d512ef321fdd289bdbe  MUSt3R_224_retrieval_codebook.pkl

43808705f381a8724aafcd27c88ece35  MUSt3R_512_cvpr.pth
8854f948a8674fb1740258c1872f80dc  MUSt3R_512.pth
f7c133906bcfd4fe6ee157a9ffa85a23  MUSt3R_512_retrieval_trainingfree.pth
1125d80b9de940de2655d19b3ff78bb5  MUSt3R_512_retrieval_codebook.pkl

Demo

Offline Gradio (+viser) Demo

By default, demo.py will open a gradio instance on localhost:7860. If you launch the demo with --viser, it will also lauch a viser instance on localhost:8080. Load the images with gradio, hit run and visualize the reconstruction as it's being made in the viser tab.

[!NOTE] demo.py is installed as must3r_demo (or must3r_demo.exe) when must3r is installed to site-packages.

python demo.py --weights /path/to/MUSt3R_512.pth --retrieval /path/to/MUSt3R_512_retrieval_trainingfree.pth --image_size 512 --viser --embed_viser

# use --amp bf16 if your gpu supports
# Use --local_network to make it accessible on the local network, or --server_name to specify the url manually
# Use --server_port to change the port, by default it will search for an available port starting at 7860
# Use --device to use a different device, by default it's "cuda"
# --allow_local_files adds a second tab to load images from a local directory
# --viser is used to launch the viser server at the same time as gradio (for real-time updates). 
# Two options:
# 1) use --embed_viser to replace the gradio "Model3D" component with the embedded viser page (recommended) 
# 2) Open a new tab and access the viser url, typically http://localhost:8080/.
# Note: only one instance of viser is launched so all clients will see the same reconstructions.
# Viser's viewer will try to target a fixed framerate and lower the quality when the framerate is low.
# To disable this behaviour, open viser with http://localhost:8080/?fixedDpr=1 (recommended)

# other examples
# 512 resolution bf16, allow local files
python demo.py --weights /path/to/MUSt3R_512.pth --retrieval /path/to/MUSt3R_512_retrieval_trainingfree.pth --image_size 512 --amp bf16 --viser --embed_viser --allow_local_files

# 224 resolution, fp16, allow local files
python3 demo.py --weights /path/to/MUSt3R_224_cvpr.pth --retrieval /path/to/MUSt3R_224_retrieval_trainingfree.pth --image_size 224 --viser --embed_viser --allow_local_files --amp fp16
# 768 resolution (will use interpolated positional embeddings)
python demo.py --weights /path/to/MUSt3R_512.pth --retrieval /path/to/MUSt3R_512_retrieval_trainingfree.pth --image_size 768 --amp bf16 --viser --embed_viser

[!Note] IMPORTANT: Explanation of the demo parameters

  1. select images:

    • you can upload images using the gradio.File component.
    • if you use --allow_local_files, a second tab will appear: local_path. In this tab, you can paste a directory path from your local machine and hit load to quickly select all the images inside this directory (not recursive).
  2. select global parameters

    • "Number of refinement iterations" increase it to 1 or 2 to do multiple pass on the keyframes (useful for loop closure)
    • "Maximum batch size" -> If you are using a small gpu or you have a lot of images, put 1 to limit the vram usage (IMPORTANT).
  3. select the inference al

View on GitHub
GitHub Stars312
CategoryDevelopment
Updated9h ago
Forks23

Languages

Python

Security Score

80/100

Audited on Apr 3, 2026

No findings