SFTGAN
CVPR18 - Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
Install / Use
/learn @xinntao/SFTGANREADME
SFTGAN [Paper] [BasicSR]
:smiley: Training codes are in BasicSR repo.
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
By Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy.
This repo only provides simple testing codes - original torch version used in the paper and a pytorch version. For full training and testing codes, please refer to BasicSR.
BibTeX
@InProceedings{wang2018sftgan,
author = {Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
title = {Recovering realistic texture in image super-resolution by deep spatial feature transform},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
Table of Contents
<p align="center"> <img src="figures/qualitative_cmp.jpg"> </p>Quick Test
It provides Torch and PyTorch versions. Recommend the PyTorch version.
PyTorch Dependencies
- Python 3
- PyTorch >= 0.4.0
- Python packages:
pip install numpy opencv-python
[OR] Torch Dependencies
- Torch
- Other torch dependencies, e.g.
nngraph,paths,image(install them byluarocks install xxx)
Test models
Note that the SFTGAN model is limited to some outdoor scenes. It is an unsatisfying limitation that we need to relax in future.
- Clone this github repo.
git clone https://github.com/xinntao/SFTGAN
cd SFTGAN
- There are two sample images in the
./data/samplesfolder. - Download pretrained models from Google Drive or Baidu Drive. Please see model list for more details.
- First run segmentation test.
[PyTorch]
cd pytorch_test
python test_segmentation.py
[Torch]
cd torch_test
th test_segmentation.lua
The segmentation results are then in ./data with _segprob, _colorimg, _byteimg suffix.
- Run sftgan test.
[PyTorch]
python test_sftgan.py.
[Torch]
th test_sftgan.lua
The results are in then in ./data with _result suffix.
Spatial Feature Modulation
SFT - Spatial Feature Transform (Modulation).
A Spatial Feature Transform (SFT) layer has been proposed to efficiently incorporate the categorical conditions into a CNN network.
There is a fantastic blog explaining the widely-used feature modulation operation distill - Feature-wise transformations.
<p align="center"> <img height="280" src="figures/network_structure.png"> </p> <!-- Spatial feature modulation is motivated by Conditional Batch Normalization (e.g., image style transfer [[1](https://arxiv.org/abs/1610.07629), [2](https://arxiv.org/abs/1703.06868), [ 3](https://arxiv.org/abs/1705.06830)] and visual reasoning [[1](https://arxiv.org/abs/1707.00683), [2](https://arxiv.org/abs/1707.03017)) ] and also feature modulation [[FiLM](https://arxiv.org/abs/1709.07871)]. (How feature modulation come?) (Conditional Batch Normalization (image style transfer, VQA) -> FiLM) (The connection with dynamic filter / attention models / spatial transform network) Our SFT layer is motived by Conditional Normalization, which is used in ... -->Semantic Categorical Prior
We have explored the use of semantic segmentation maps as categorical prior for SR.
<p align="center"> <img height="230" src="figures/semantic_category_prior.jpg"> </p> <p align="center"> <img src="figures/different_priors.png"> </p>OST dataset
- Outdoor Scene Train/Test
OST300 300 test images of outdoor scences
Download the OST dataset from Google Drive or Baidu Drive.
:satisfied: Image Viewer - HandyViewer
May try HandyViewer - an image viewer that you can switch image with a fixed zoom ratio, easy for comparing image details.
Related Skills
node-connect
345.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
106.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
345.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
345.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
