Robust Learning Through Cross-Task Consistency <br>

<table> <tr><td><em>Above: A comparison of the results from consistency-based learning and learning each task individually. The yellow markers highlight the improvement in fine grained details.</em></td></tr> </table> <br> This repository contains tools for training and evaluating models using consistency:

Pretrained models
Demo code and an online live demo
Uncertainty energy estimation code
Training scripts
Docker and installation instructions

for the following paper:

<div style="text-align:center"> <h4><a href=https://consistency.epfl.ch>Robust Learing Through Cross-Task Consistency</a> (CVPR 2020, Best Paper Award Nomination, Oral)</h4> </div> <br>

For further details, a live demo, video visualizations, and an overview talk, refer to our project website.

PROJECT WEBSITE:

| LIVE DEMO | VIDEO VISUALIZATION |:----:|:----:| | Upload your own images and see the results of different consistency-based models vs. various baselines.<br><br><img src=./assets/screenshot-demo.png width="400"> | Visualize models with and without consistency, evaluated on a (non-cherry picked) YouTube video.<br><br><br><img src=./assets/output_video.gif width="400"> |

</div>

Introduction
Installation
Quickstart (demo code)
Energy computation
Download all pretrained models
Train a consistency model
- Instructions for training
- To train on other configurations
Citing

<br>

Introduction

Visual perception entails solving a wide set of tasks (e.g. object detection, depth estimation, etc). The predictions made for each task out of a particular observation are not independent, and therefore, are expected to be consistent.

What is consistency? Suppose an object detector detects a ball in a particular region of an image, while a depth estimator returns a flat surface for the same region. This presents an issue -- at least one of them has to be wrong, because they are inconsistent.

Why is it important?

Desired learning tasks are usually predictions of different aspects of a single underlying reality (the scene that underlies an image). Inconsistency among predictions implies contradiction.
Consistency constraints are informative and can be used to better fit the data or lower the sample complexity. They may also reduce the tendency of neural networks to learn "surface statistics" (superficial cues) by enforcing constraints rooted in different physical or geometric rules. This is empirically supported by the improved generalization of models when trained with consistency constraints.

How do we enforce it? The underlying concept is that of path independence in a network of tasks. Given an endpoint Y2, the path from X->Y1->Y2 should give the same results as X->Y2. This can be generalized to a larger system, with paths of arbitrary lengths. In this case, the nodes of the graph are our prediction domains (eg. depth, normal) and the edges are neural networks mapping these domains.

This repository includes training code for enforcing cross task consistency, demo code for visualizing the results of a consistency trained model on a given image and links to download these models. For further details, refer to our paper or website.

Consistency Domains

Consistency constraints can be used for virtually any set of domains. This repository considers transferring between image domains, and our networks were trained for transferring between the following domains from the Taskonomy dataset.

Curvature         Edge-3D            Reshading
Depth-ZBuffer     Keypoint-2D        RGB       
Edge-2D           Keypoint-3D        Surface-Normal

The repo contains consistency-trained models for RGB -> Surface-Normals, RGB -> Depth-ZBuffer, and RGB -> Reshading. In each case the remaining 7 domains are used as consistency constraints in during training.

Descriptions for each domain can be found in the supplementary file of Taskonomy.

Network Architecture

All networks are based on the UNet architecture. They take in an input size of 256x256, upsampling is done via bilinear interpolations instead of deconvolutions and trained with the L1 loss. See the table below for more information.

| Task Name | Output Dimension | Downsample Blocks | |-------------------------|------------------|-------------------| | RGB -> Depth-ZBuffer | 256x256x1 | 6 | | RGB -> Reshading | 256x256x1 | 5 | | RGB -> Surface-Normal | 256x256x3 | 6 |

Other networks (e.g. Curvature -> Surface-Normal) use a UNet, their architecture hyperparameters are detailed in transfers.py.

More information on the models, including download links, can be found here and in the supplementary material.

Installation

There are two convenient ways to run the code. Either using Docker (recommended) or using a Python-specific tool such as pip, conda, or virtualenv.

Installation via Docker [Recommended]

We provide a docker that contains the code and all the necessary libraries. It's simple to install and run.

Simply run:

docker run --runtime=nvidia -ti --rm epflvilab/xtconsistency:latest

The code is now available in the docker under your home directory (/app), and all the necessary libraries should already be installed in the docker.

Installation via Pip/Conda/Virtualenv

The code can also be run using a Python environment manager such as Conda. See requirements.txt for complete list of packages. We recommend doing a clean installation of requirements using virtualenv:

Clone the repo:

git clone git@github.com:EPFL-VILAB/XTConsistency.git
cd XTConsistency

Create a new environment and install the libraries:

conda create -n testenv -y python=3.6
source activate testenv
pip install -r requirements.txt

Quickstart (Run Demo Locally)

Download the consistency trained networks

If you haven't yet, then download the pretrained models. Models used for the demo can be downloaded with the following command:

sh ./tools/download_models.sh

This downloads the baseline, consistency trained models for depth, normal and reshading target (1.3GB) to a folder called ./models/. Individial models can be downloaded here.

Run a model on your own image

To run the trained model of a task on a specific image:

python demo.py --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT

The --task flag specifies the target task for the input image, which should be either normal, depth or reshading.

To run the script for a normal target on the example image:

python demo.py --task normal --img_path assets/test.png --output_path assets/

It returns the output prediction from the baseline (test_normal_baseline.png) and consistency models (test_normal_consistency.png).

Similarly, running for target tasks reshading and depth gives the following.

Baseline (reshading) | Consistency (reshading) | Baseline (depth) | Consistency (depth) :-------------------------: |:-------------------------: | :-------------------------: |:-------------------------: | | |

Energy Computation

Training with consistency involves several paths that each predict the target domain, but using different cues to do so. The disagreement between these predictions yields an unsupervised quantity, consistency energy, that our CVPR 2020 paper found correlates with prediciton error. You can view the pixel-wise consistency energy (example below) using our [live demo](https://consistency.

XTConsistency

Install / Use

README