UniDepth
Universal Monocular Metric Depth Estimation
Install / Use
/learn @lpiccinelli-eth/UniDepthREADME
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler,
Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, Luc Van Gool,
under submission,
Paper at arXiv 2502.20110
UniDepth: Universal Monocular Metric Depth Estimation

UniDepth: Universal Monocular Metric Depth Estimation,
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher Yu,
CVPR 2024,
Paper at arXiv 2403.18913
News and ToDo
- [ ] HuggingFace/Gradio demo.
- [x]
28.02.2025: Release UniDepthV2. - [x]
15.10.2024: Release training code. - [x]
02.04.2024: Release UniDepth as python package. - [x]
01.04.2024: Inference code and V1 models are released. - [x]
26.02.2024: UniDepth is accepted at CVPR 2024! (Highlight :star:)
Zero-Shot Visualization
YouTube (The Office - Parkour)
<p align="center"> <img src="assets/docs/theoffice.gif" alt="animated" /> </p>NuScenes (stitched cameras)
<p align="center"> <img src="assets/docs/nuscenes_surround.gif" alt="animated" /> </p>Installation
Requirements are not in principle hard requirements, but there might be some differences (not tested):
- Linux
- Python 3.10+
- CUDA 11.8+
Install the environment needed to run UniDepth with:
export VENV_DIR=<YOUR-VENVS-DIR>
export NAME=Unidepth
python -m venv $VENV_DIR/$NAME
source $VENV_DIR/$NAME/bin/activate
# Install UniDepth and dependencies, cuda >11.8 work fine, too.
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu118
# Install Pillow-SIMD (Optional)
pip uninstall pillow
CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
# Install KNN (for evaluation only)
cd unidepth/ops/knn;bash compile.sh;cd ../../../
If you use conda, you should change the following:
python -m venv $VENV_DIR/$NAME -> conda create -n $NAME python=3.11
source $VENV_DIR/$NAME/bin/activate -> conda activate $NAME
Note: Make sure that your compilation CUDA version and runtime CUDA version match.
You can check the supported CUDA version for precompiled packages on the PyTorch website.
Note: xFormers may raise the the Runtime "error": Triton Error [CUDA]: device kernel image is invalid.
This is related to xFormers mismatching system-wide CUDA and CUDA shipped with torch.
It may considerably slow down inference.
Run UniDepth on the given assets to test your installation (you can check this script as guideline for further usage):
python ./scripts/demo.py
If everything runs correctly, demo.py should print: ARel: 7.45%.
If you encounter Segmentation Fault after running the demo, you may need to uninstall torch via pip (pip uninstall torch) and install the torch version present in requirements with conda.
Get Started
After installing the dependencies, you can load the pre-trained models easily from Hugging Face as follows:
from unidepth.models import UniDepthV1
model = UniDepthV1.from_pretrained("lpiccinelli/unidepth-v1-vitl14") # or "lpiccinelli/unidepth-v1-cnvnxtl" for the ConvNext backbone
Then you can generate the metric depth estimation and intrinsics prediction directly from RGB image only as follows:
import numpy as np
from PIL import Image
# Move to CUDA, if any
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
# Load the RGB image and the normalization will be taken care of by the model
rgb = torch.from_numpy(np.array(Image.open(image_path))).permute(2, 0, 1) # C, H, W
predictions = model.infer(rgb)
# Metric Depth Estimation
depth = predictions["depth"]
# Point Cloud in Camera Coordinate
xyz = predictions["points"]
# Intrinsics Prediction
intrinsics = predictions["intrinsics"]
You can use ground truth intrinsics as input to the model as well:
intrinsics_path = "assets/demo/intrinsics.npy"
# Load the intrinsics if available
intrinsics = torch.from_numpy(np.load(intrinsics_path)) # 3 x 3
# For V2, we defined camera classes. If you pass a 3x3 tensor (as above)
# it will convert to Pinhole, but you can pass classes from camera.py.
# The `Camera` class is meant as an abstract, use only child classes as e.g.:
from unidepth.utils.camera import Pinhole, Fisheye624
camera = Pinhole(K=intrinsics) # pinhole
# fill in fisheye, params: fx,fy,cx,cy,d1,d2,d3,d4,d5,d6,t1,t2,s1,s2,s3,s4
camera = Fisheye624(params=torch.tensor([...]))
predictions = model.infer(rgb, camera)
To use the forward method for your custom training, you should:
- Take care of the dataloading:
a) ImageNet-normalization
b) Long-edge based resizing (and padding) with input shape provided inimage_shapeunder configs
c)BxCxHxWformat
d) If any intriniscs given, adapt them accordingly to your resizing - Format the input data structure as:
data = {"image": rgb, "K": intrinsics}
predictions = model(data, {})
Model Zoo
The available models are the following:
<table border="0"> <tr> <th>Model</th> <th>Backbone</th> <th>Name</th> </tr> <tr> <td rowspan="2"><b>UnidepthV1</b></td> <td>ConvNext-L</td> <td><a href="https://huggingface.co/lpiccinelli/unidepth-v1-cnvnxtl">unidepth-v1-cnvnxtl</a></td> </tr> <tr> <td>ViT-L</td> <td><a href="https://huggingface.co/lpiccinelli/unidepth-v1-vitl14">unidepth-v1-vitl14</a></td> </tr> <hr style="border: 2px solid black;"> <tr> <td rowspan="3"><b>UnidepthV2</b></td> <td>ViT-S</td> <td><a href="https://huggingface.co/lpiccinelli/unidepth-v2-vits14">unidepth-v2-vits14</a></td> </tr> <tr> <td>ViT-B</td> <td><a href="https://huggingface.co/lpiccinelli/unidepth-v2-vitb14">unidepth-v2-vits14</a></td> </tr> <tr> <td>ViT-L</td> <td><a href="https://huggingface.co/lpiccinelli/unidepth-v2-vitl14">unidepth-v2-vitl14</a></td> </tr> </table>Please visit Hugging Face or click on the links above to access the repo models with weights.
You can load UniDepth as the following, with name variable matching the table above:
from unidepth.models import UniDepthV1, UniDepthV2
model_v1 = UniDepthV1.from_pretrained(f"lpiccinelli/{name}")
model_v2 = UniDepthV2.from_pretrained(f"lpiccinelli/{name}")
In addition, we provide loading from TorchHub as:
version = "v2"
backbone = "vitl14"
model = torch.hub.load("lpiccinelli-eth/UniDepth", "UniDepth", version=version, backbone=backbone, pretrained=True, trust_repo=True, force_reload=True)
You can look into function UniDepth in hubconf.py to see how to instantiate the model from local file: provide a local path in line 34.
UniDepthV2
Visit UniDepthV2 ReadMe for a more detailed changelog. To summarize the main differences are:
- Improved performance and edge sharpness. (
EdgeGuidedLocalSSI) - Input shape and ratio flexibility. (
self.resolution_level) - Confidence output.
- Faster inference.
- ONNX support.
- New cameras support (see
camera.py).
UnidepthV2old is actually V1 version updated to compensate for wave artifacts due to wrong LiDAR accumulation.
Training
Please visit the training README for more information.
Results
Metric Depth Estimation
The performance reported is for UniDepthV1 model and the metrics is d1 (higher is better) on zero-shot evaluation. The common split between SUN-RGBD and NYUv2 is removed from SUN-RGBD validation set for evaluat
Related Skills
node-connect
334.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
334.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.1kCommit, push, and open a PR
