RollingDepth

[CVPR 2025] RollingDepth: Video Depth without Video Models

Generate Convert Improve

Install / Use

/learn @prs-eth/RollingDepth

About this skill

Quality Score

0/100

README

🛹 RollingDepth: Video Depth without Video Models

CVPR 2025

This repository represents the official implementation of the paper titled "Video Depth without Video Models".

Bingxin Ke1, Dominik Narnhofer1, Shengyu Huang1, Lei Ke2, Torben Peters1, Katerina Fragkiadaki2, Anton Obukhov1, Konrad Schindler1

1ETH Zurich, 2Carnegie Mellon University

📢 News

2025-02-26: Paper is accepted to CVPR 2025. 2024-12-02: Paper is on arXiv. 2024-11-28: Inference code is released.

🛠️ Setup

The inference code was tested on: Debian 12, Python 3.12.7 (venv), CUDA 12.4, GeForce RTX 3090

📦 Repository

git clone https://github.com/prs-eth/RollingDepth.git
cd RollingDepth

🐍 Python environment

Create python environment:

# with venv
python -m venv venv/rollingdepth
source venv/rollingdepth/bin/activate

# or with conda
conda create --name rollingdepth python=3.12
conda activate rollingdepth

💻 Dependencies

Install dependicies:

pip install -r requirements.txt
bash script/install_diffusers_dev.sh  # Install modified diffusers with cross-frame self-attention

We use pyav for video I/O, which relies on ffmpeg (tested with version 5.1.6-0+deb12u1).

To see the modification in diffusers, search for comments "Modified in RollingDepth".

🏃 Test on your videos

All scripts are designed to run from the project root directory.

📷 Prepare input videos

Use sample videos:
```
bash script/download_sample_data.sh
```
These example videos are to be used only as debug/demo input together with the code and should not be distributed outside of the repo.
Or place your videos in a directory, for example, under data/samples.

🚀 Run with presets

python run_video.py \
    -i data/samples \
    -o output/samples_fast \
    -p fast \
    --verbose

-p or --preset: preset options
- fast for fast inference, with dilations [1, 25] (flexible), fp16, without refinement, at max. resolution 768.
- fast1024 for fast inference at resolution 1024
- full for better details, with dilations [1, 10, 25] (flexible), fp16, with 10 refinement steps, at max. resolution 1024.
- paper for reproducing paper numbers, with (fixed) dilations [1, 10, 25], fp32, with 10 refinement steps, at max. resolution 768.
-i or --input-video: path to input data, can be a single video file, a text file with video paths, or a directory of videos.
-o or --output-dir: output directory.

Passing these inference arguments will overwrite the preset settings:

--res or --processing-resolution: the maximum resolution (in pixels) at which image processing will be performed. If set to 0, processes at the original input image resolution.
--refine-step: number of refinement iterations to improve accuracy and details. Set to 0 to disable refinement.
--snip-len or --snippet-lengths: number of frames to analyze in each snippet.
-d or --dilations: spacing between frames for temporal analysis, could have multiple values e.g. -d 1 10 25.

Clip sub-sequence to be processed:

--from or --start-frame: the starting frame index for processing, default to 0.
--frames or --frame-count: number of frames to process after the starting frame. Set to 0 (default) to process until the end of the video.

Output settings

--fps or --output-fps: frame rate (FPS) for the output video. Set to 0 (default) to match the input video's frame rate.
--restore-res or --restore-resolution: whether to restore the output to the original input resolution after processing, Default: False.
--save-sbs or --save-side-by-side: whether to save side-by-side videos of RGB and colored depth. Default: True.
--save-npy: whether to save depth maps as .npy files. Default: True.
--save-snippets: whether to save initial snippets. Default: False

Other argumenets

Please run python run_video.py --help to get details for other arguments.
For low GPU memory footage: pass --max-vae-bs 1 --unload-snippet true and use a smaller resolution, e.g. --res 512

⬇ Checkpoint cache

By default, the checkpoint is stored in the Hugging Face cache. The HF_HOME environment variable defines its location and can be overridden, e.g.:

export HF_HOME=$(pwd)/cache

Alternatively, use the following script to download the checkpoint weights locally and specify checkpoint path by -c checkpoint/rollingdepth-v1-0

bash script/download_weight.sh

🦿 Evaluation on test datasets

Coming soon

🎓 Citation

@InProceedings{ke2024rollingdepth,
    title={Video Depth without Video Models}, 
    author={Bingxin Ke and Dominik Narnhofer and Shengyu Huang and Lei Ke and Torben Peters and Katerina Fragkiadaki and Anton Obukhov and Konrad Schindler},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2025}
}

🙏 Acknowledgments

We thank Yue Pan, Shuchang Liu, Nando Metzger, and Nikolai Kalischek for fruitful discussions.

We are grateful to redmond.ai (robin@redmond.ai) for providing GPU resources.

🎫 License

This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

The model is licensed under RAIL++-M License (as defined in the LICENSE-MODEL)

By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.

Related Skills

docs-writer

99.0k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

334.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

mcp-documentation-server

300

MCP Documentation Server - Bridge the AI Knowledge Gap. ✨ Features: Document management • Gemini integration • AI-powered semantic search • File uploads • Smart chunking • Multilingual support • Zero-setup 🎯 Perfect for: New frameworks • API docs • Internal guides

prs-eth

View profile

View on GitHub

GitHub Stars605

CategoryContent

Updated5d ago

Forks26

prs-eth/RollingDepth

Languages

Python

Security Score

100/100

Audited on Mar 19, 2026

No findings