PromptDA

[CVPR 2025] Prompt Depth Anything

Generate Convert Improve

Install / Use

/learn @DepthAnything/PromptDA

About this skill

Quality Score

0/100

README

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Project Page | Paper | Hugging Face Demo | Interactive Results | Data

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
CVPR 2025

teaser

📰 News

PromptDA++ paper is available. Code and models will be released, please stay tuned.
Releasing ScanNet++ ZipNeRF Reconstruction Depth Results

🛠️ Installation

<details> <summary> Setting up the environment </summary>

git clone https://github.com/DepthAnything/PromptDA.git
cd PromptDA
pip install -r requirements.txt
pip install -e .
sudo apt install ffmpeg  # for video generation

</details> <details> <summary> Pre-trained Models </summary>

Only Prompt-Depth-Anything-Large is used to benchmark in our paper. Prompt-Depth-Anything-Small-Transparent is further fine-tuned 10K steps with hammer dataset with our iPhone lidar simulation method to improve the performance on transparent objects.

</details>

🚀 Usage

<details> <summary> Example usage </summary>

from promptda.promptda import PromptDA
from promptda.utils.io_wrapper import load_image, load_depth, save_depth

DEVICE = 'cuda'
image_path = "assets/example_images/image.jpg"
prompt_depth_path = "assets/example_images/arkit_depth.png"
image = load_image(image_path).to(DEVICE)
prompt_depth = load_depth(prompt_depth_path).to(DEVICE) # 192x256, ARKit LiDAR depth in meters

model = PromptDA.from_pretrained("depth-anything/prompt-depth-anything-vitl").to(DEVICE).eval()
depth = model.predict(image, prompt_depth) # HxW, depth in meters

save_depth(depth, prompt_depth=prompt_depth, image=image)

</details>

📸 Running on your own capture

You can use Stray Scanner App to capture your own data, which requires iPhone 12 Pro or later Pro models, iPad 2020 Pro or later Pro models. We setup a Hugging Face Space for you to quickly test our model. If you want to obtain video results, please follow the following steps.

<details> <summary> Testing steps </summary>

Capture a scene with the Stray Scanner App. (The charging port is preferred to face downward or to the right.)
Use the iPhone Files App to compress it into a zip file and transfer it to your computer. Here is an example screen recording.
Run the following commands to infer our model and generate the video results.

export PATH_TO_ZIP_FILE=data/8b98276b0a.zip # Replace with your own zip file path
export PATH_TO_SAVE_FOLDER=data/8b98276b0a_results # Replace with your own save folder path
python3 -m promptda.scripts.infer_stray_scan --input_path ${PATH_TO_ZIP_FILE} --output_path ${PATH_TO_SAVE_FOLDER}
python3 -m promptda.scripts.generate_video process_stray_scan --input_path ${PATH_TO_ZIP_FILE} --result_path ${PATH_TO_SAVE_FOLDER}
ffmpeg -framerate 60 -i ${PATH_TO_SAVE_FOLDER}/%06d_smooth.jpg  -c:v libx264 -pix_fmt yuv420p ${PATH_TO_SAVE_FOLDER}.mp4

</details>

👏 Acknowledgements

We thank the generous support from Prof. Weinan Zhang for robot experiments, including the space, objects and the Unitree H1 robot. We also thank Zhengbang Zhu, Jiahang Cao, Xinyao Li, Wentao Dong for their help in setting up the robot platform and collecting robot data.

📚 Citation

If you find this code useful for your research, please use the following BibTeX entry

@inproceedings{lin2024promptda,
  title={Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation},
  author={Lin, Haotong and Peng, Sida and Chen, Jingxiao and Peng, Songyou and Sun, Jiaming and Liu, Minghuan and Bao, Hujun and Feng, Jiashi and Zhou, Xiaowei and Kang, Bingyi},
  journal={arXiv},
  year={2024}
}

Related Skills

node-connect

334.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

334.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.2k

Commit, push, and open a PR