Reloc3r
[CVPR 2025] Relative camera pose estimation and visual localization with Reloc3r
Install / Use
/learn @ffrivera0/Reloc3rREADME
Table of Contents
- TODO List
- Installation
- Usage
- Evaluation on Relative Camera Pose Estimation
- Evaluation on Visual Localization
- Training
- Citation
- Acknowledgments
TODO List
- [x] Release pre-trained weights and inference code.
- [x] Release evaluation code for ScanNet1500 and MegaDepth1500 datasets.
- [x] Release evaluation code for 7Scenes and Cambridge datasets.
- [x] Release sample code for self-captured images and videos.
- [x] Release training code and data.
- [x] Gradio demo for relative camera pose regression, and its visualization.
- [ ] Evaluation code for other datasets.
- [ ] Accelerated version for visual localization.
Installation
- Clone Reloc3r
git clone --recursive https://github.com/ffrivera0/reloc3r.git
cd reloc3r
# if you have already cloned reloc3r:
# git submodule update --init --recursive
- Create the environment using conda
conda create -n reloc3r python=3.11 cmake=3.14.0
conda activate reloc3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt
# optional: you can also install additional packages to:
# - add support for HEIC images
pip install -r requirements_optional.txt
- Optional: Compile the cuda kernels for RoPE
# Reloc3r relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../
- Optional: Download the checkpoints Reloc3r-224/Reloc3r-512. The pre-trained model weights will automatically download when running the evaluation and demo code below.
Usage
Using Reloc3r, you can estimate camera poses for images and videos you captured.
For relative pose estimation, try the demo code in wild_relpose.py. We provide some image pairs used in our paper.
# replace the args with your paths
python wild_relpose.py --v1_path ./data/wild_images/zurich0.jpg --v2_path ./data/wild_images/zurich1.jpg --output_folder ./data/wild_images/
Visualize the relative pose
# replace the args with your paths
python visualization.py --mode relpose --pose_path ./data/wild_images/pose2to1.txt
For visual localization, the demo code in wild_visloc.py estimates absolute camera poses from sampled frames in self-captured videos.
[!IMPORTANT] The demo simply uses the first and last frames as the database, which <strong>requires</strong> overlapping regions among all images. This demo does <strong>not</strong> support linear motion. We provide some videos as examples.
# replace the args with your paths
python wild_visloc.py --video_path ./data/wild_video/ids.MOV --output_folder ./data/wild_video
Visualize the absolute poses
# replace the args with your paths
python visualization.py --mode visloc --pose_folder ./data/wild_video/ids_poses/
Evaluation on Relative Camera Pose Estimation
To reproduce our evaluation on ScanNet1500, download the dataset here and unzip it to ./data/scannet1500.
Then run the following script.
bash scripts/eval_scannet1500.sh
To reproduce our evaluation on MegaDepth1500, download the dataset here and unzip it to ./data/megadepth1500.
Then run the following script.
bash scripts/eval_megadepth1500.sh
[!NOTE] To achieve faster inference speed, set
--amp=1. This enables evaluation withfp16, which increases speed from <strong>24 FPS</strong> to <strong>40 FPS</strong> on an RTX 4090 with Reloc3r-512, without any accuracy loss.
Evaluation on Visual Localization
To reproduce our evaluation on 7Scenes, download the dataset here and unzip it to ./data/7scenes.
Then run the following script.
bash scripts/eval_7scenes.sh
To reproduce our evaluation on Cambridge, download the dataset here and unzip it to ./data/cambridge.
Then run the following script.
bash scripts/eval_cambridge.sh
Training
We follow DUSt3R to process the training data. Download the datasets: CO3Dv2, ScanNet++, ARKitScenes, BlendedMVS, MegaDepth, DL3DV, RealEstate10K.
For each dataset, we provide a preprocessing script in the datasets_preprocess directory and an archive containing the list of pairs when needed. You have to download the datasets yourself from their official sources, agree to their license, and run the preprocessing script.
We provide a sample script to train Reloc3r with ScanNet++ on an RTX 3090 GPU
bash scripts/train_small.sh
To reproduce our training for Reloc3r-512 with 8 H800 GPUs, run the following script
bash scripts/train.sh
[!NOTE] They are not strictly equivalent to what was used to train Reloc3r, but they should be close enough.
Citation
If you find our work helpful in your research, please consider citing:
@article{reloc3r,
title={Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization},
author={Dong, Siyan and Wang, Shuzhe and Liu, Shaohui and Cai, Lulu and Fan, Qingnan and Kannala, Juho and Yang, Yanchao},
journal={arXiv preprint arXiv:2412.08376},
year={2024}
}
Acknowledgments
Thanks to these great repositories: Croco, DUSt3R, Marepo, and many other inspiring works in the community.
Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.8kCommit, push, and open a PR
