NVDS (ICCV 2023) & NVDS+ (TPAMI 2024) 🚀🚀🚀

🎉🎉🎉 Welcome to the NVDS GitHub repository! 🎉🎉🎉

The repository is official PyTorch implementation of ICCV 2023 paper "Neural Video Depth Stabilizer" (NVDS)

Authors: Yiran Wang1, Min Shi1, Jiaqi Li1, Zihao Huang1, Zhiguo Cao1, Jianming Zhang2, Ke Xian3, Guosheng Lin3

Project Page | Arxiv | Video | 视频 | Poster | Supp | VDW Dataset | VDW Toolkits

TPAMI 2024 "NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation" (NVDS+)

Authors: Yiran Wang1, Min Shi1, Jiaqi Li1, Chaoyi Hong1, Zihao Huang1, Juewen Peng3, Zhiguo Cao1, Jianming Zhang2, Ke Xian1, Guosheng Lin3

Institutes: 1Huazhong University of Science and Technology, 2Adobe Research, 3Nanyang Technological University

Paper | Arxiv | Video | 视频 | Supp

😎 Highlights

NVDS is the first plug-and-play stabilizer that can remove flickers from any single-image depth model without extra effort. Besides, we also introduce a large-scale dataset, Video Depth in the Wild (VDW), which consists of 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset. Don't forget to star this repo if you find it interesting!

💦 License and Releasing Policy

VDW dataset.

We have released the VDW dataset under strict conditions. We must ensure that the releasing won’t violate any copyright requirements. To this end, we will not release any video frames or the derived data in public. Instead, we provide meta data and detailed toolkits, which can be used to reproduce VDW or generate your own data. The meta data contains IMDB numbers, starting time, end time, movie durations, resolutions, and cropping areas. All the meta data and toolkits are licensed under CC BY-NC-SA 4.0, which can only be used for academic and research purposes. Please refer to our VDW official website and VDW Toolkits for data usage.
NVDS code and model.

Following MiDaS and CVD, NVDS model adopts the widely-used MIT License.

⚡ Updates and Todo List

[2024.10.04] The TPAMI 2024 paper of our NVDS+ is presented in Arxiv and IEEE.
[2024.10.02] The extension NVDS+：Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation is accepted by TPAMI 2024!.
[2024.06.03] The VDW official toolkits to reproduce VDW and generate your own data.
[2024.01.22] We release the supplementary video for the journal extension from NVDS to NVDS+.
[2024.01.22] The metadata and evaluation code of the VDW test set.
[2023.09.17] Upload NVDS Poster for ICCV2023.
[2023.09.09] Evaluation code on VDW test set is released.
[2023.09.09] VDW official website goes online.
[2023.08.11] Release evaluation code and checkpoint of NYUDV2-finetuned NVDS.
[2023.08.10] Update the camera ready version of NVDS paper and supplementary.
[2023.08.05] Update license of VDW dataset: CC BY-NC-SA 4.0.
[2023.07.21] We present the NVDS checkpoint and demo (inference) code.
[2023.07.18] Our Project Page is built and released.
[2023.07.18] The Arxiv version of our NVDS paper is released.
[2023.07.16] Our work is accepted by ICCV2023.

🌼 Abstract

Video depth estimation aims to infer temporally consistent depth. Some methods achieve temporal consistency by finetuning a single-image depth model during test time using geometry and re-projection constraints, which is inefficient and not robust. An alternative approach is to learn how to enforce temporal consistency from data, but this requires well-designed models and sufficient video depth data. To address these challenges, we propose a plug-and-play framework called Neural Video Depth Stabilizer (NVDS) that stabilizes inconsistent depth estimations and can be applied to different single-image depth models without extra effort. We also introduce a large-scale dataset, Video Depth in the Wild (VDW), which consists of 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset to our knowledge. We evaluate our method on the VDW dataset as well as two public benchmarks and demonstrate significant improvements in consistency, accuracy, and efficiency compared to previous approaches. Our work serves as a solid baseline and provides a data foundation for learning-based video depth models. We will release our dataset and code for future research.

🔨 Installation

Basic environment.

Our code is based on python=3.8.13 and pytorch==1.9.0. Refer to the requirements.txt for installation.

conda create -n NVDS python=3.8.13
conda activate NVDS
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install numpy imageio opencv-python scipy tensorboard timm scikit-image tqdm glob h5py

Installation of GMflow.

We utilize state-of-the-art optical flow model GMFlow in the temporal loss and the OPW metric. The temporal loss is used to enhance consistency while training. The OPW metric is evaluated in our demo (inference) code to showcase quantitative improvements. Please refer to the GMFlow Official Repo for the installation.
Installation of mmcv and mmseg.

Cross attention in our stabilization network contains functions based on mmcv-full==1.3.0 and mmseg==0.11.0. Please refer to MMSegmentation-v0.11.0 and their official document for detailed installation instructions step by step. The key is to match the version of mmcv-full and mmsegmentation with the version of cuda and pytorch on your server. For instance, I have CUDA 11.1 and PyTorch 1.9.0 on my server, thus mmcv-full 1.3.x and mmseg 0.11.0 (as in our installation instructions) are compatible with my environment (confirmed by mmcv-full 1.3.x). Different servers adopt different Cuda versions, thus I can not specify the specific installation for all people. You should check the matching version of your own server on the official documents of mmcv-full and mmseg. You can choose different versions in their documents and check the version matching relations. By reading and following the detailed mmcv-full and mmseg documents, the installation seems to be easy. You can also refer to Issue #1 for some discussions.

Besides, we suggest you to install mmcv-full==1.x.x, because some API or functions are removed in mmcv-full==2.x.x (you need to adjust our code for mmcv-full==2.x.x).

🔥 Demo & Inference

Preparing Demo Videos.

We put 8 demo input videos in demo_videos folder, in which bandage_1 and market_6 are examples of [MPI Sintel dataset](http://sintel.is.tue.mpg

NVDS

Install / Use

README