Mdrp
[ICCV 2025 Oral] RePoseD: Efficient Relative Pose Estimation With Known Depth Information
Install / Use
/learn @kocurvik/MdrpREADME
<a href="https://colab.research.google.com/github/kocurvik/mdrp/blob/main/demo/reposed_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<a href="https://colab.research.google.com/github/kocurvik/mdrp/blob/main/demo/reposed_demo_roma.ipynb" target="_parent"><img src="https://img.shields.io/badge/Open_In_Colab (RoMa)-blue" alt="Open In Colab"/></a>
https://github.com/user-attachments/assets/aea6ac6d-9f4c-42d1-8c49-fb0462b40d41
Demos
We provide demos for the estimation of relative pose of two images and following dense two-view reconstruction using MoGev2 + SuperPoint + LightGlue: <a href="https://colab.research.google.com/github/kocurvik/mdrp/blob/main/demo/reposed_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> or MoGev2 + RoMa: <a href="https://colab.research.google.com/github/kocurvik/mdrp/blob/main/demo/reposed_demo_roma.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
We also provide a script to create a nice visualization of the resulting pointcloud:
https://github.com/user-attachments/assets/7cae9b8e-2008-4b1b-97cc-9e07f0ee03d0
You can also try a script to perform dynamic scene reconstruction. Note that this script does not currently include foreground/background segmentation so it works only for videos with static background with sufficient features for matching. Below is an example.
https://github.com/user-attachments/assets/7825f13a-d1e3-4572-a16c-a7827cfd04ea
To run the last two scripts you need to install MoGe, LightGlue and Open3D. You will also need to follow the instructions in the next section.
Use in your own project
To use RePoseD in your own project you must first install PoseLib with our PR. Currently, you need to compile the master branch of PoseLib yourself. Using pip install poselib will not work until a new version with wheels is released.
pip install https://github.com/PoseLib/PoseLib
If this is not sufficient you may need to first install some extra packages and/or clone the repo manually:
git clone https://github.com/PoseLib/PoseLib
cd PoseLib
pip install pybind11_stubs
apt-get install libeigen3-dev
python setup.py install
Once installed you can use the new methods added to poselib for relative pose estimation.
import poselib
# extract keypoints and their corresponding depths
# make sure you remove any nans or infs
# set your thresholds
ransac_dict = {'max_epipolar_error': 2.0, 'max_reproj_error': 16.0}
# set this to true if you also want to estimate shit (calib case only)
ransac_dict['monodepth_estimate_shift'] = False
# use this loss for better estimation in final optimization
bundle_dict = {'loss_type': 'TRUNCATED_CAUCHY'}
# if you know intrinsics you can use this
camera1 = {'model': 'SIMPLE_PINHOLE', 'width': -1, 'height': -1, 'params': [f1, px1, py1]}
camera2 = {'model': 'SIMPLE_PINHOLE', 'width': -1, 'height': -1, 'params': [f2, px2, py2]}
geometry, info = poselib.estimate_monodepth_relative_pose(points1, points2, depths1, depths2, camera1, camera2, ransac_dict, bundle_dict)
pose = geometry.pose
# for uknown and shared focals you can use (pp is the principal point - usually image center)
image_pair, info = poselib.estimate_monodepth_shared_focal_relative_pose(points1 - pp1, points2 - pp2, depths1, depths2, ransac_dict, bundle_dict)
f = image_pair.camera1.focal()
geometry = image_pair.geometry
pose = geometry.pose
# for uknown and different focals you can use
image_pair, info = poselib.estimate_monodepth_varying_focal_relative_pose(points1 - pp1, points2 - pp2, depths1, depths2, ransac_dict, bundle_dict)
f1 = image_pair.camera1.focal()
f2 = image_pair.camera2.focal()
geometry = image_pair.geometry
pose = geometry.pose
# to transform the pointcloud from the first image into the coordinates of the second image you can use:
xyz1_in_camera2_frame = (1/geometry.scale) * ((pose.R @ xyz1.T).T + pose.t)
Citation
If you find our work useful please consider citing:
@inproceedings{ding2025reposed,
title={RePoseD: Efficient Relative Pose Estimation With Known Depth Information},
author={Ding, Yaqing and Kocur, Viktor and V{\'a}vra, V{\'a}clav and Haladov{\'a}, Zuzana Berger and Yang, Jian and Sattler, Torsten and Kukelova, Zuzana},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
Extended Results
After finishing the camera-ready version for ICCV we have implemented an improved LO for PoseLib which optimizes the Sampson and reprojection errors jointly. This results in significant improvement in terms of accuracy often surpassing MADPose results while being significantly faster.
Benchmark results are presented EXTENDED_RESULTS.md. Below is a preview of results on the PhotoTourism dataset.
Phototourism (Calibrated)
<table> <tr><td rowspan="2" style="vertical-align : middle;text-align:center;">Depth</td><td rowspan="2" style="vertical-align : middle;text-align:center;">Method</td><td rowspan="2" style="vertical-align : middle;text-align:center;">Scale</td><td rowspan="2" style="vertical-align : middle;text-align:center;">Shift</td><td colspan="3" align="center">SP+LG</td><td align="center" colspan="3">RoMA</td></tr> <tr><td>$\epsilon(^\circ)\downarrow$</td><td>mAA $\uparrow$</td><td>Runtime (ms)</td><td>$\epsilon(^\circ)\downarrow$</td><td>mAA $\uparrow$</td><td>Runtime (ms)</td></tr> <td rowspan="1" style="vertical-align : middle;text-align:center;">-</td> <td>5-Point</td><td></td><td></td> <td>1.42</td><td>76.56</td><td>63.79</td><td>0.78</td><td>86.18</td><td>264.61</td> </tr> <td rowspan="5" style="vertical-align : middle;text-align:center;">MoGe</td> <td>3P-RelDepth</td><td></td><td></td> <td>8.12</td><td>53.40</td><td>55.85</td><td>1.69</td><td>67.22</td><td>221.06</td> </tr> <tr> <td>P3P</td><td></td><td></td> <td>1.40</td><td>77.37</td><td>32.95</td><td>0.78</td><td>86.42</td><td>148.76</td> </tr> <tr> <td>MADPose</td><td>✔</td><td>✔</td> <td>1.27</td><td>80.28</td><td>788.18</td><td>0.87</td><td>86.85</td><td>1753.49</td> </tr> <tr> <td>Ours</td><td>✔</td><td>✔</td> <td><strong>1.24</strong></td><td><strong>81.34</strong></td><td><strong>28.93</strong></td><td><strong>0.74</strong></td><td><strong>88.58</strong></td><td><strong>125.66</strong></td> </tr> <tr> <td>Ours*</td><td>✔</td><td></td> <td>1.75</td><td>80.29</td><td>30.11</td><td>1.03</td><td>88.02</td><td>135.95</td> </tr> <td rowspan="5" style="vertical-align : middle;text-align:center;">UniDepth</td> <td>3P-RelDepth</td><td></td><td></td> <td>4.07</td><td>51.60</td><td>52.49</td><td>1.33</td><td>67.56</td><td>214.73</td> </tr> <tr> <td>P3P</td><td></td><td></td> <td>1.40</td><td>77.47</td><td>34.30</td><td>0.78</td><td>86.43</td><td>150.95</td> </tr> <tr> <td>MADPose</td><td>✔</td><td>✔</td> <td>1.15</td><td>82.09</td><td>720.34</td><td>0.78</td><td>87.60</td><td>1695.57</td> </tr> <tr> <td>Ours</td><td>✔</td><td>✔</td> <td><strong>1.04</strong></td><td>83.71</td><td><strong>30.88</strong></td><td><strong>0.69</strong></td><td>89.27</td><td><strong>131.52</strong></td> </tr> <tr> <td>Ours*</td><td>✔</td><td></td> <td>1.16</td><td><strong>84.56</strong></td><td>31.19</td><td>0.81</td><td><strong>90.18</strong></td><td>137.26</td> </tr> </table> <table> <tr><td rowspan="2" style="vertical-align : middle;text-align:center;">Depth</td><td rowspan="2" style="vertical-align : middle;text-align:center;">Method</td><td rowspan="2" style="vertical-align : middle;text-align:center;">Scale</td><td rowspan="2" style="vertical-align : middle;text-align:center;">Shift</td><td colspan="3" align="center">MASt3R</td> <tr><td>$\epsilon(^\circ)\downarrow$</td><td>mAA $\uparrow$</td><td>Runtime (ms)</td></tr> <td rowspan="1" style="vertical-align : middle;text-align:center;">-</td> <td>5-Point</td><td></td><td></td> <td>1.14</td><td>81.66</td><td>137.75</td> </tr> </table>* Denotes the use of P3P + our new optimization strategy.
ICCV (2025) Evaluation
To run the experiments from the paper you can use the provided evaluation code. We are currently working on improvements to the methods. For reproducibility we keep the original code available in the iccv-eval branches.
Setting up eval repo, PoseLib and Madpose
You will need to clone this repo and install our PoseLib and Madpose forks with all variants.
Note that the PoseLib variant used in the evaluation scripts is different from the one mentioned in previous sections which includes only our version.
# create a conda environment
conda create -n mdrp
conda activate mdrp
conda install numpy, scipy, tqdm, hd5py, tectonic, prettytable, matplotlib, seaborn, eigen=3.4
# cloning this repo
Related Skills
node-connect
338.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.6kCommit, push, and open a PR
