InfiniDepth

[CVPR 2026] InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

Generate Convert Improve

Install / Use

/learn @zju3dv/InfiniDepth

About this skill

Quality Score

0/100

README

<div align="center"> <h1> <img src="https://raw.githubusercontent.com/Tarikul-Islam-Anik/Animated-Fluent-Emojis/master/Emojis/Objects/Telescope.png" alt="Telescope" width="40" height="40" /> InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields </h1> <div align="center"> <a href="https://zju3dv.github.io/InfiniDepth/"> <img src="https://img.shields.io/badge/Project-Page-red?logo=googlechrome&logoColor=red"> </a> <a href="https://arxiv.org/abs/2601.03252"> <img src="https://img.shields.io/badge/arXiv-Paper-blue?logo=arxiv&logoColor=blue"> </a> <a href="https://huggingface.co/spaces/ritianyu/InfiniDepth"> <img src="https://img.shields.io/badge/HuggingFace-Demo-yellow?logo=huggingface&logoColor=yellow"> </a> <a href="https://huggingface.co/datasets/ritianyu/game_4k_data"> <img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?logo=huggingface&logoColor=orange"> </a> <a href="assets/wechat.jpg"> <img src="https://img.shields.io/badge/微信-WeChat-green?logo=wechat&logoColor=green"> </a> </div> <p align="center"> <a href="https://ritianyu.github.io/">Hao Yu*</a> • <a href="https://haotongl.github.io/">Haotong Lin*</a> • <a href="https://github.com/PLUS-WAVE">Jiawei Wang*</a> • <a href="https://github.com/Dustbin-Li">Jiaxin Li</a> • <a href="https://wangyida.github.io/">Yida Wang</a> • <a href="#">Xueyang Zhang</a> • <a href="https://ywang-zju.github.io/">Yue Wang</a> <br> <a href="https://www.xzhou.me/">Xiaowei Zhou</a> • <a href="https://csse.szu.edu.cn/staff/ruizhenhu/">Ruizhen Hu</a> • <a href="https://pengsida.net/">Sida Peng</a> </p> </div> <div align="center"> <img src="assets/demo.gif" alt="InfiniDepth Demo" width="90%" /> </div>

📢 News

[2026-04] 🎉 Training and evaluation code of InfiniDepth (RGB Only & Depth Sensor Augmentation) is available now!

[2026-03] 🎉 Inference code of InfiniDepth (RGB Only & Depth Sensor Augmentation) is available now!

[2026-02] 🎉 InfiniDepth has been accepted to CVPR 2026! Code coming soon!

✨ What can InfiniDepth do?

InfiniDepth supports three practical capabilities for single-image 3D perception and reconstruction:

| Capability | Input | Output | | --- | --- | --- | | Monocular & Arbitrary-Resolution Depth Estimation | RGB Image | Arbitrary-Resolution Depth Map | | Monocular View Synthesis | RGB Image | 3D Gaussian Splatting (3DGS) | | Depth Sensor Augmentation (Monocular Metric Depth Estimation) | RGB Image + Depth Sensor | Metric Depth + 3D Gaussian Splatting (3DGS) |

⚙️ Installation

Please see INSTALL.md for manual installation.

🤗 Hugging Face Space Demo

If you want to test InfiniDepth before running local CLI inference, start with the hosted demo:

Hugging Face Space: https://huggingface.co/spaces/ritianyu/InfiniDepth

This repo also includes a Gradio Space entrypoint at app.py:

Input: RGB image (required), depth map (optional)
Task Switch: Depth / 3DGS
Model Switch: InfiniDepth / InfiniDepth_DepthSensor

Local run

python app.py

Notes

In this demo, InfiniDepth_DepthSensor requires a depth map input; RGB-only inference should use InfiniDepth.
Supported depth formats in the demo upload: .png, .npy, .npz, .h5, .hdf5, .exr.

🚀 Inference

Quick Command Index

| If you want ... | Recommended command | | --- | --- | | Relative Depth from Single RGB Image | bash example_scripts/infer_depth/courtyard_infinidepth.sh | | 3D Gaussian from Single RGB Image | bash example_scripts/infer_gs/courtyard_infinidepth_gs.sh | | Metric Depth from RGB + Depth Sensor | bash example_scripts/infer_depth/eth3d_infinidepth_depthsensor.sh | | 3D Gaussian from RGB + Depth Sensor | bash example_scripts/infer_gs/eth3d_infinidepth_depthsensor_gs.sh | | Multi-View / Video Depth + Global Point Cloud | bash example_scripts/infer_depth/waymo_multi_view_infinidepth.sh |

<details> <summary><strong> 1. Relative Depth from Single RGB Image</strong> (<code>inference_depth.py</code>)</summary>

Use this when you want a relative depth map from a single RGB image and, optionally, a point cloud export.

Required input

RGB image

Required checkpoints

checkpoints/depth/infinidepth.ckpt
checkpoints/moge-2-vitl-normal/model.pt recover metric scale for point cloud export

Optional checkpoint

checkpoints/sky/skyseg.onnx additional sky filtering

Recommended command

python inference_depth.py \
  --input_image_path=example_data/image/courtyard.jpg \
  --model_type=InfiniDepth \
  --depth_model_path=checkpoints/depth/infinidepth.ckpt \
  --output_resolution_mode=upsample \
  --upsample_ratio=2

Replace example_data/image/courtyard.jpg with your own image path.

For the example above, outputs are written to

example_data/pred_depth/ for the colorized depth map
example_data/pred_pcd/ for the exported point cloud when --save_pcd=True

Example scripts

bash example_scripts/infer_depth/courtyard_infinidepth.sh
bash example_scripts/infer_depth/camera_infinidepth.sh
bash example_scripts/infer_depth/eth3d_infinidepth.sh
bash example_scripts/infer_depth/waymo_infinidepth.sh

Most useful options

| Argument | What it controls | | --- | --- | | --output_resolution_mode | Choose upsample, original, or specific. | | --upsample_ratio | Used when output_resolution_mode=upsample. | | --output_size | Explicit output size (H,W) when output_resolution_mode=specific. | | --save_pcd | Export a point cloud alongside the depth map. | | --fx_org --fy_org --cx_org --cy_org | Camera intrinsics in the original image resolution. |

</details> <details> <summary><strong>2. 3D Gaussian + Novel-View Video from Single RGB Image</strong> (<code>inference_gs.py</code>)</summary>

Use this when you want a 3D Gaussian export from a single RGB image and an optional novel-view video.

Required input

RGB image

Required checkpoints

checkpoints/depth/infinidepth.ckpt
checkpoints/gs/infinidepth_gs.ckpt
checkpoints/moge-2-vitl-normal/model.pt recover metric scale for 3D Gaussian export

Optional checkpoint

checkpoints/sky/skyseg.onnx additional sky filtering

Recommended command

python inference_gs.py \
  --input_image_path=example_data/image/courtyard.jpg \
  --model_type=InfiniDepth \
  --depth_model_path=checkpoints/depth/infinidepth.ckpt \
  --gs_model_path=checkpoints/gs/infinidepth_gs.ckpt

Replace example_data/image/courtyard.jpg with your own image path.

For the example above, outputs are written to

example_data/pred_gs/InfiniDepth_courtyard_gaussians.ply
example_data/pred_gs/InfiniDepth_courtyard_novel_orbit.mp4

If --render_size is omitted, the novel-view video is rendered at the original input image resolution.

Example scripts

bash example_scripts/infer_gs/courtyard_infinidepth_gs.sh
bash example_scripts/infer_gs/camera_infinidepth_gs.sh
bash example_scripts/infer_gs/fruit_infinidepth_gs.sh
bash example_scripts/infer_gs/eth3d_infinidepth_gs.sh

Most useful options

| Argument | What it controls | | --- | --- | | --render_novel_video | Turn novel-view rendering on or off. | | --render_size | Output video resolution (H,W). | | --novel_trajectory | Camera motion type: orbit or swing. | | --sample_point_num | Number of sampled points used for gaussian construction. | | --enable_skyseg_model | Enable sky masking before gaussian sampling. | | --sample_sky_mask_dilate_px | Dilate the sky mask before filtering. |

The exported .ply files can be visualized in 3D viewers such as SuperSplat.

</details> <details> <summary><strong>3. Depth Sensor Augmentation (Metric Depth and 3D Gaussian from RGB + Depth Sensor)</strong></summary>

Use this mode when you have an RGB image plus metric depth from a depth sensor.

Required inputs

RGB image
Sparse depth in .png, .npy, .npz, .h5, .hdf5, or .exr

Required checkpoints

checkpoints/depth/infinidepth_depthsensor.ckpt
checkpoints/moge-2-vitl-normal/model.pt
checkpoints/gs/infinidepth_depthsensor_gs.ckpt

Required flags

--model_type=InfiniDepth_DepthSensor
--input_depth_path=...

Metric Depth Inference Command

python inference_depth.py \
  --input_image_path=example_data/image/eth3d_office.png \
  --input_depth_path=example_data/depth/eth3d_office.npz \
  --model_type=InfiniDepth_DepthSensor \
  --depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
  --fx_org=866.39 \
  --fy_org=866.04 \
  --cx_org=791.5 \
  --cy_org=523.81 \
  --output_resolution_mode=upsample \
  --upsample_ratio=1

3D Gaussian Inference Command

python inference_gs.py \
  --input_image_path=example_data/image/eth3d_office.png \
  --input_depth_path=example_data/depth/eth3d_office.npz \
  --model_type=InfiniDepth_DepthSensor \
  --depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
  --gs_model_path=checkpoints/gs/infinidepth_depthsensor_gs.ckpt \
  --fx_org=866.39 \
  --fy_org=866.04 \
  --cx_org=791.5 \
  --cy_org=523.81

Example scripts

bash example_scripts/infer_depth/eth3d_infinidepth_depthsensor.sh
bash example_scripts/infer_depth/waymo_infinidepth_depthsensor.sh
bash example_scripts/infer_gs/eth3d_infinidepth_depthsensor_gs.sh
bash example_scripts/infer_gs/waymo_infinidepth_depthsensor_gs.sh

Most useful options

| Argument | What it controls | | --- | --- | | --fx_org --fy_org --cx_org --cy_org | Strongly recommended when you know the sensor intrinsics. | | --output_resolution_mode | Output behavior for inference_depth.py. | | --render_size | Video resolution for inference_gs.py. | | --output_ply_dir | Custom output directory for gaussian export. |

</details> <details> <summary><strong>4. Multi-View / Vide

Related Skills

node-connect

348.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

348.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

348.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。