ResDepth

[ISPRS Journal of Photogrammetry and Remote Sensing, 2022] ResDepth: A Deep Residual Prior For 3D Reconstruction From High-resolution Satellite Images

Generate Convert Improve

Install / Use

/learn @prs-eth/ResDepth

About this skill

Quality Score

0/100

README

ResDepth: A Deep Residual Prior For 3D Reconstruction From High-resolution Satellite Images

ResDepth

This repository provides the code to train and evaluate ResDepth, an efficient and easy-to-use neural architecture for learned DSM refinement from satellite imagery. It represents the official implementation of the paper:

ResDepth: A Deep Residual Prior For 3D Reconstruction From High-resolution Satellite Images

Corinne Stucker, Konrad Schindler

Abstract: Modern optical satellite sensors enable high-resolution stereo reconstruction from space. But the challenging imaging conditions when observing the Earth from space push stereo matching to its limits. In practice, the resulting digital surface models (DSMs) are fairly noisy and often do not attain the accuracy needed for high-resolution applications such as 3D city modeling. Arguably, stereo correspondence based on low-level image similarity is insufficient and should be complemented with a-priori knowledge about the expected surface geometry beyond basic local smoothness. To that end, we introduce ResDepth, a convolutional neural network that learns such an expressive geometric prior from example data. ResDepth refines an initial, raw stereo DSM while conditioning the refinement on the images. I.e., it acts as a smart, learned post-processing filter and can seamlessly complement any stereo matching pipeline. In a series of experiments, we find that the proposed method consistently improves stereo DSMs both quantitatively and qualitatively. We show that the prior encoded in the network weights captures meaningful geometric characteristics of urban design, which also generalize across different districts and even from one city to another. Moreover, we demonstrate that, by training on a variety of stereo pairs, ResDepth can acquire a sufficient degree of invariance against variations in imaging conditions and acquisition geometry.

Requirements

This code has been developed and tested on Ubuntu 18.04 with Python 3.7, PyTorch 1.9, and GDAL 2.2.3. It may work for other setups but has not been tested thoroughly.

On Ubuntu 18.04, gdal can be installed with apt-get:

sudo apt update
sudo apt install libgdal-dev gdal-bin

Setup

Before proceeding, make sure that GDAL is installed and set up correctly.

To create a Python virtual environment and install the required dependencies, please run:

git clone https://github.com/stuckerc/ResDepth.git
cd ResDepth
python3 -m venv tmp/resdepth
source tmp/resdepth/bin/activate
(resdepth) $ pip install --upgrade pip setuptools wheel
(resdepth) $ pip install -r requirements.txt

in your working directory. Next, use the following command to install GDAL in the previously created virtual environment:

(resdepth) $ pip install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==`gdal-config --version`

Quick Start

ResDepth follows a residual learning strategy, i.e., it is trained to refine an imperfect input DSM by regressing a per-pixel correction to the height, using both the DSM and ortho-rectified panchromatic (stereo) images as input.

Data Preparation

We assume that the initial surface reconstruction has been generated with existing multi-view stereo matching and/or depth map fusion techniques. Furthermore, we assume that the images have already been ortho-rectified with the help of the initial surface estimate. Please note that this repository does not provide any functionality to perform data pre-processing (initial surface reconstruction, ortho-rectification) nor image pair selection.

When preparing your data as input for ResDepth, make sure to meet the following requirements:

The initial reconstruction and the reference model are parametrized as digital surface models (DSMs, i.e., height fields in raster format).
The initial DSM and ground truth DSM are reprojected to the same horizontal coordinate system (for instance, the local UTM zone). The pixel values of the initial DSM and the ground truth DSM must refer to the same vertical coordinate system or otherwise be vertically aligned.
ResDepth accepts ground truth DSMs that exhibit NoData cells. However, NoData cells in the initial DSM are prohibited. Hence, you might have to interpolate the initial DSM to replace any NoData cells.
The initial DSM is used to ortho-rectify the panchromatic satellite views.
The initial DSM, the ground truth DSM, and the ortho-images are stored in GeoTIFF raster format. Furthermore, all rasters must be co-registered, cropped to the same spatial (rectangular) extent, and exhibit the same spatial resolution (i.e., pixel-aligned rasters).

Note: Throughout our experiments, we use DSMs with a grid spacing of 0.25 m. By construction, ResDepth is generic and can be trained to refine any DSM. However, if the spatial resolution deviates from our setting, it might be required to adapt the tile size tile_size of the DSM patches and/or the depth depth of the U-Net
(number of downsampling and upsampling layers).

For evaluation, ResDepth accepts the following additional rasters as input:

Ground truth mask: A raster in GeoTIFF format. A pixel value of 1 indicates a pixel with valid reference height, whereas a pixel value of 0 indicates a pixel with invalid reference height. This raster is used to mask out areas with evident temporal differences between the initial and ground truth DSM due to construction activities.
Building mask: A raster in GeoTIFF format. A pixel with a value of 1 indicates a building pixel, and a pixel with a value of 0 is a terrain pixel. If a building mask is provided, the quantitative errors are computed for buildings and terrain separately. Before calculating the object-specific metrics, the building mask is dilated by two pixels to avoid aliasing at vertical walls.
Water mask: A raster in GeoTIFF format. A pixel with a value of 1 indicates a water pixel, and a pixel with a value of 0 is a non-water pixel. Water bodies are excluded from the evaluation only if a building mask is provided.
Forest mask: A raster in GeoTIFF format. A pixel with a value of 1 indicates a forest pixel, and a pixel with a value of 0 is a non-forest pixel. Densely forested areas are excluded from the evaluation only if a building mask is provided.

Note: All rasters (DSMs, ortho-images, mask rasters) must be co-registered and cropped to the same spatial (rectangular) extent. Furthermore, the spatial resolution must be the same.

Data Structure

We do not expect a particular structure of how the data is stored. The path to every initial DSM, to the corresponding ground truth DSM, and possibly to raster masks must be listed in the configuration file (see below). The ortho-images and the definition of the image pairs must be provided as text files.

Image List

Prepare a text file imagelist.txt that lists the absolute paths to the pre-computed ortho-rectified satellite images (one file path per line):

path/to/ortho-image1.tif
path/to/ortho-image2.tif
path/to/ortho-image3.tif
path/to/ortho-image4.tif
path/to/ortho-image5.tif
path/to/ortho-image6.tif
path/to/ortho-image7.tif
...

It is possible to use the same image list for training, validation, and testing.

Note: The filename of every image listed in the image list has to be unique (irrespective of the absolute file path due to the definition of the image pair list, see below).

Image Pair List

Prepare a text file pairlist.txt that comprises a comma-separated list of filenames, where every line defines one image pair. If multiple image pairs are specified, each pair needs to be of equal length (i.e., the same number of
images per image pair).

Example image pair list for ResDepth-stereo: The following image pair list defines a single stereo pair composed of the images ortho-image1.tif and ortho-image2.tif. The absolute paths to the ortho-images will be derived by matching the image filenames listed in pairlist.txt and imagelist.txt.

ortho-image1.tif, ortho-image2.tif

Example image pair list for ResDepth-stereo, generalized across viewpoints: To train a ResDepth-stereo network that generalizes across variations in acquisition geometry and the images' radiometry, one simply has to provide multiple stereo pairs in the image pair list, for example:

ortho-image1.tif, ortho-image2.tif
ortho-image2.tif, ortho-image5.tif
ortho-image4.tif, ortho-image5.tif

In this example, ResDepth-stereo will be trained using the three image pairs (ortho-image1.tif, ortho-image2.tif), (ortho-image2.tif, ortho-image5.tif), and (ortho-image4.tif, ortho-image5.tif).

Example image pair list for ResDepth-mono: The following example shows an image pair list to train ResDepth-mono
using the single image ortho-image1.tif as guidance:

ortho-image1.tif,

Example image pair list for ResDepth-0: The network variant ResDepth-0 does not leverage any satellite views. Therefore, to train or evaluate ResDepth-0, neither the image list imagelist.txt nor the image pair list pairlist.txt have to be provided.

Training

To train ResDepth, run the script train.py with a JSON configuration file as the unique argument:

(resdepth) $ python train.py config.json

The configuration file config.json specifies the input data and the output directory (see below for details). Furthermore, it configures

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。