RollingDepth
[CVPR 2025] RollingDepth: Video Depth without Video Models
Install / Use
/learn @prs-eth/RollingDepthREADME
🛹 RollingDepth: Video Depth without Video Models
CVPR 2025
This repository represents the official implementation of the paper titled "Video Depth without Video Models".
Bingxin Ke<sup>1</sup>, Dominik Narnhofer<sup>1</sup>, Shengyu Huang<sup>1</sup>, Lei Ke<sup>2</sup>, Torben Peters<sup>1</sup>, Katerina Fragkiadaki<sup>2</sup>, Anton Obukhov<sup>1</sup>, Konrad Schindler<sup>1</sup>
<sup>1</sup>ETH Zurich, <sup>2</sup>Carnegie Mellon University
📢 News
2025-02-26: Paper is accepted to CVPR 2025. <br> 2024-12-02: Paper is on arXiv.<br> 2024-11-28: Inference code is released.<br>
🛠️ Setup
The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090
📦 Repository
git clone https://github.com/prs-eth/RollingDepth.git
cd RollingDepth
🐍 Python environment
Create python environment:
# with venv
python -m venv venv/rollingdepth
source venv/rollingdepth/bin/activate
# or with conda
conda create --name rollingdepth python=3.12
conda activate rollingdepth
💻 Dependencies
Install dependicies:
pip install -r requirements.txt
bash script/install_diffusers_dev.sh # Install modified diffusers with cross-frame self-attention
We use pyav for video I/O, which relies on ffmpeg (tested with version 5.1.6-0+deb12u1).
To see the modification in diffusers, search for comments "Modified in RollingDepth".
🏃 Test on your videos
All scripts are designed to run from the project root directory.
📷 Prepare input videos
-
Use sample videos:
bash script/download_sample_data.shThese example videos are to be used only as debug/demo input together with the code and should not be distributed outside of the repo.
-
Or place your videos in a directory, for example, under
data/samples.
🚀 Run with presets
python run_video.py \
-i data/samples \
-o output/samples_fast \
-p fast \
--verbose
-por--preset: preset optionsfastfor fast inference, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768.fast1024for fast inference at resolution 1024fullfor better details, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024.paperfor reproducing paper numbers, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768.
-ior--input-video: path to input data, can be a single video file, a text file with video paths, or a directory of videos.-oor--output-dir: output directory.
Passing these inference arguments will overwrite the preset settings:
--resor--processing-resolution: the maximum resolution (in pixels) at which image processing will be performed. If set to 0, processes at the original input image resolution.--refine-step: number of refinement iterations to improve accuracy and details. Set to 0 to disable refinement.--snip-lenor--snippet-lengths: number of frames to analyze in each snippet.-dor--dilations: spacing between frames for temporal analysis, could have multiple values e.g.-d 1 10 25.
Clip sub-sequence to be processed:
--fromor--start-frame: the starting frame index for processing, default to 0.--framesor--frame-count: number of frames to process after the starting frame. Set to 0 (default) to process until the end of the video.
Output settings
--fpsor--output-fps: frame rate (FPS) for the output video. Set to 0 (default) to match the input video's frame rate.--restore-resor--restore-resolution: whether to restore the output to the original input resolution after processing, Default: False.--save-sbsor--save-side-by-side: whether to save side-by-side videos of RGB and colored depth. Default: True.--save-npy: whether to save depth maps as .npy files. Default: True.--save-snippets: whether to save initial snippets. Default: False
Other argumenets
- Please run
python run_video.py --helpto get details for other arguments. - For low GPU memory footage: pass
--max-vae-bs 1 --unload-snippet trueand use a smaller resolution, e.g.--res 512
⬇ Checkpoint cache
By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.:
export HF_HOME=$(pwd)/cache
Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by -c checkpoint/rollingdepth-v1-0
bash script/download_weight.sh
🦿 Evaluation on test datasets
Coming soon
🎓 Citation
@InProceedings{ke2024rollingdepth,
title={Video Depth without Video Models},
author={Bingxin Ke and Dominik Narnhofer and Shengyu Huang and Lei Ke and Torben Peters and Katerina Fragkiadaki and Anton Obukhov and Konrad Schindler},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
🙏 Acknowledgments
We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions.
We are grateful to redmond.ai (robin@redmond.ai) for providing GPU resources.
🎫 License
This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).
The model is licensed under RAIL++-M License (as defined in the LICENSE-MODEL)
By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.
Related Skills
docs-writer
99.0k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
334.9kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.8kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
mcp-documentation-server
300MCP Documentation Server - Bridge the AI Knowledge Gap. ✨ Features: Document management • Gemini integration • AI-powered semantic search • File uploads • Smart chunking • Multilingual support • Zero-setup 🎯 Perfect for: New frameworks • API docs • Internal guides
