SkillAgentSearch skills...

USRNet

Deep Unfolding Network for Image Super-Resolution (CVPR, 2020) (PyTorch)

Install / Use

/learn @cszn/USRNet

README

Deep unfolding network for image super-resolution

visitors

Kai Zhang, Luc Van Gool, Radu Timofte
Computer Vision Lab, ETH Zurich, Switzerland

[Paper][Code]


[Training code --> KAIR]

git clone https://github.com/cszn/KAIR.git

  • Training with DataParallel - PSNR
python main_train_psnr.py --opt options/train_usrnet.json
  • Training with DistributedDataParallel - PSNR - 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 main_train_psnr.py --opt options/train_usrnet.json  --dist True

Classical SISR degradation model

For a scale factor of $\mathbf{s}$, the classical (traditional) degradation model of SISR assumes the low-resolution (LR) image $\mathbf{y}$ is a blurred, decimated, and noisy version of a high-resolution (HR) image $\mathbf{x}$. Mathematically, it can be expressed by

$$\mathbf{y}=\left(\mathbf{x}\otimes\mathbf{k}\right)\downarrow_{\mathrm{{s}}}+\mathbf{n}$$

where $\otimes$ represents two-dimensional convolution of $\mathbf{x}$ with blur kernel $\mathbf{k}$, $\downarrow_{\mathrm{{s}}}$ denotes the standard $\mathbf{s}$-fold downsampler, i.e., keeping the upper-left pixel for each distinct $\mathbf{s}\times \mathbf{s}$ patch and discarding the others, and n is usually assumed to be additive, white Gaussian noise (AWGN) specified by standard deviation (or noise level) $\mathbf{\sigma}$. With a clear physical meaning, it can approximate a variety of LR images by setting proper blur kernels, scale factors and noises for underlying HR images. In particular, it has been extensively studied in model-based methods which solve a combination of a data term and a prior term under the MAP framework. Especially noteworthy is that it turns into a special case for deblurring when $\mathbf{s} = 1$.

Motivation

<img src="figs/category.png" width="536px"/>

Learning-based single image super-resolution (SISR) methods are continuously showing superior effectiveness and efficiency over traditional model-based methods, largely due to the end-to-end training. However, different from model-based methods that can handle the SISR problem with different scale factors, blur kernels and noise levels under a unified MAP (maximum a posteriori) framework, learning-based methods (e.g., SRMD [3]) generally lack such flexibility.

[1] "Learning deep CNN denoiser prior for image restoration." CVPR, 2017.
[2] "Deep plug-and-play super-resolution for arbitrary blur kernels." CVPR, 2019.
[3] "Learning a single convolutional super-resolution network for multiple degradations." CVPR, 2018.
<img src="figs/fig1.png" width="440px"/>

While the classical degradation model can result in various LR images for an HR image, with different blur kernels, scale factors and noise, the study of learning a single end-to-end trained deep model to invert all such LR images to HR image is still lacking.

This work focuses on non-blind SISR which assumes the LR image, scale factor, blur kernel and noise level are known beforehand. In fact, non-blind SISR is still an active research direction.

  • First, the blur kernel and noise level can be estimated, or are known based on other information (e.g., camera setting).
  • Second, users can control the preference of sharpness and smoothness by tuning the blur kernel and noise level.
  • Third, non-blind SISR can be an intermediate step towards solving blind SISR.

Unfolding algorithm

By unfolding the MAP inference via a half-quadratic splitting algorithm, a fixed number of iterations consisting of alternately solving a data subproblem and a prior subproblem can be obtained.

#TODO

Deep unfolding SR network

We propose an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods. USRNet inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model, while maintaining the advantages of learning-based methods.

<img src="figs/architecture.png" width="900px"/>

The overall architecture of the proposed USRNet with 8 iterations. USRNet can flexibly handle the classical degradation via a single model as it takes the LR image, scale factor, blur kernel and noise level as input. Specifically, USRNet consists of three main modules, including the data module D that makes HR estimation clearer, the prior module P that makes HR estimation cleaner, and the hyper-parameter module H that controls the outputs of D and P.

  • Data module D: closed-form solution for the data term; contains no trainable parameters
  • Prior module P: ResUNet denoiser for the prior term
  • Hyper-parameter module H: MLP for the hyper-parameter; acts as a slide bar to control the outputs of D and P

Models

|Model|# iters|# params|ResUNet| |---|:--:|:---:|:---:| |USRNet | 8 | 17.02M |64-128-256-512| |USRGAN | 8 | 17.02M |64-128-256-512| |USRNet-tiny| 6 | 0.59M |16-32-64-64 | |USRGAN-tiny| 6 | 0.59M |16-32-64-64 |

Codes

Blur kernels

|<img src="figs/isotropic_gaussian.gif" width="285px"/>|<img src="figs/anisotropic_gaussian.gif" width="285px"/>|<img src="figs/motion.gif" width="285px"/>| |:---:|:---:|:---:| |<i>(a) Isotropic Gaussian kernels</i>|<i>(b) Anisotropic Gaussian kernels</i>|<i>(c) Motion blur kernels</i>|

While it has been pointed out that anisotropic Gaussian kernels are enough for SISR task, the SISR method that can handle more complex blur kernels would be a preferred choice in real applications.

Approximated bicubic kernel under classical SR degradation model assumption

|<img src="figs/bicubic_kernelx2.png" width="285px"/>|<img src="figs/bicubic_kernelx3.png" width="285px"/>|<img src="figs/bicubic_kernelx4.png" width="285px"/>| |:---:|:---:|:---:| |<i>(a) Bicubic kernel (x2)</i>|<i>(b) Bicubic kernel (x3)</i>|<i>(c) Bicubic kernel (x4)</i>|

The bicubic degradation can be approximated by setting a proper blur kernel for the classical degradation. Note that the bicubic kernels contain negative values.

PSNR results

Run main_test_table1.py to produce the following results.

<img src="figs/psnr.png" width="900px"/> The table shows the average PSNR(dB) results of different methods for different combinations of scale factors, blur kernels and noise levels.

Visual results of USRNet

<img align="left" src="figs/butterfly_x2_k10_LR.png" width="240px"/> <img align="center" src="figs/butterfly_x3_k2_LR.png" width="240px"/> <img align="right" src="figs/butterfly_x4_k7_LR.png" width="240px"/>

<p align="center"><i>(a) LR images with scale factors 2, 3 and 4</i></p>

<img align="left" src="figs/butterfly_x2_k10_usrnet.png" width="240px"/> <img align="center" src="figs/butterfly_x3_k2_usrnet.png" width="240px"/> <img align="right" src="figs/butterfly_x4_k7_usrnet.png" width="240px"/>

<p align="center"><i>(b) Results by the single USRNet model with s = 2, 3 and 4</i></p>

Visual results of USRGAN

<img align="left" src="figs/parrot_x4_k3_LR.png" width="240px"/> <img align="center" src="figs/parrot_x4_k6_LR.png" width="240px"/> <img align="right" src="figs/parrot_x4_k12_LR.png" width="240px"/>

<p align="center"><i>(a) LR images</i></p>

<img align="left" src="figs/parrot_x4_k3_usrgan.png" width="240px"/> <img align="center" src="figs/parrot_x4_k6_usrgan.png" width="240px"/> <img align="right" src="figs/parrot_x4_k12_usrgan.png" width="240px"/>

<p align="center"><i>(b) Results by USRGAN(x4)</i></p>

|<img align="center" src="figs/test_57_x4_k1_LR.png" width="448px"/> | <img align="center" src="figs/test_57_x4_k1_usrgan.png" width="448px"

Related Skills

View on GitHub
GitHub Stars908
CategoryEducation
Updated2d ago
Forks118

Languages

Python

Security Score

100/100

Audited on Mar 24, 2026

No findings