🚀 Metric3D Project 🚀

Official PyTorch implementation of Metric3Dv1 and Metric3Dv2:

[1] Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

[2] Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

🏆 Champion in CVPR2023 Monocular Depth Estimation Challenge

News

[2024/8] Metric3Dv2 is accepted by TPAMI!
[2024/7/5] Our stable-diffusion alternative GeoWizard has now been accepted by ECCV 2024! Check NOW the repository and paper for the finest-grained geometry ever! 🎉🎉🎉
[2024/6/25] Json files for KITTI datasets now available! Refer to Training for more details
[2024/6/3] ONNX is supported! We appreciate @xenova for their remarkable efforts!
[2024/4/25] Weights for ViT-giant2 model released!
[2024/4/11] Training codes are released!
[2024/3/18] HuggingFace 🤗 GPU version updated!
[2024/3/18] Project page released!
[2024/3/18] Metric3D V2 models released, supporting metric depth and surface normal now!
[2023/8/10] Inference codes, pre-trained weights, and demo released.
[2023/7] Metric3D accepted by ICCV 2023!
[2023/4] The Champion of 2nd Monocular Depth Estimation Challenge in CVPR 2023

🌼 Abstract

Metric3D is a strong and robust geometry foundation model for high-quality and zero-shot metric depth and surface normal estimation from a single image. It excels at solving in-the-wild scene reconstruction. It can directly help you measure the size of structures from a single image. Now it achieves SOTA performance on over 10 depth and normal benchmarks.

depth_normal

metrology

📝 Benchmarks

Metric Depth

Our models rank 1st on the routing KITTI and NYU benchmarks.

| | Backbone | KITTI δ1 ↑ | KITTI δ2 ↑ | KITTI AbsRel ↓ | KITTI RMSE ↓ | KITTI RMS_log ↓ | NYU δ1 ↑ | NYU δ2 ↑ | NYU AbsRel ↓ | NYU RMSE ↓ | NYU log10 ↓ | |---------------|-------------|------------|-------------|-----------------|---------------|------------------|----------|----------|---------------|-------------|--------------| | ZoeDepth | ViT-Large | 0.971 | 0.995 | 0.053 | 2.281 | 0.082 | 0.953 | 0.995 | 0.077 | 0.277 | 0.033 | | ZeroDepth | ResNet-18 | 0.968 | 0.996 | 0.057 | 2.087 | 0.083 | 0.954 | 0.995 | 0.074 | 0.269 | 0.103 | | IEBins | SwinT-Large | 0.978 | 0.998 | 0.050 | 2.011 | 0.075 | 0.936 | 0.992 | 0.087 | 0.314 | 0.031 | | DepthAnything | ViT-Large | 0.982 | 0.998 | 0.046 | 1.985 | 0.069 | 0.984 | 0.998 | 0.056 | 0.206 | 0.024 | | Ours | ViT-Large | 0.985 | 0.998 | 0.044 | 1.985 | 0.064 | 0.989 | 0.998 | 0.047 | 0.183 | 0.020 | | Ours | ViT-giant2 | 0.989 | 0.998 | 0.039 | 1.766 | 0.060 | 0.987 | 0.997 | 0.045 | 0.187 | 0.015 |

Affine-invariant Depth

Even compared to recent affine-invariant depth methods (Marigold and Depth Anything), our metric-depth (and normal) models still show superior performance.

| | #Data for Pretrain and Train | KITTI Absrel ↓ | KITTI δ1 ↑ | NYUv2 AbsRel ↓ | NYUv2 δ1 ↑ | DIODE-Full AbsRel ↓ | DIODE-Full δ1 ↑ | Eth3d AbsRel ↓ | Eth3d δ1 ↑ | |-----------------------|----------------------------------------------|----------------|------------|-----------------|------------|---------------------|-----------------|----------------------|------------| | OmniData (v2, ViT-L) | 1.3M + 12.2M | 0.069 | 0.948 | 0.074 | 0.945 | 0.149 | 0.835 | 0.166 | 0.778 | | MariGold (LDMv2) | 5B + 74K | 0.099 | 0.916 | 0.055 | 0.961 | 0.308 | 0.773 | 0.127 | 0.960 | | DepthAnything (ViT-L) | 142M + 63M | 0.076 | 0.947 | 0.043 | 0.981 | 0.277 | 0.759 | 0.065 | 0.882 | | Ours (ViT-L) | 142M + 16M | 0.042 | 0.979 | 0.042 | 0.980 | 0.141 | 0.882 | 0.042 | 0.987 | | Ours (ViT-g) | 142M + 16M | 0.043 | 0.982 | 0.043 | 0.981 | 0.136 | 0.895 |

Metric3D

Install / Use

README