Esrgan
Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution
Install / Use
/learn @leverxgroup/EsrganREADME
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Pipeine for Image Super-Resolution task that based on a frequently cited paper, ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks (Wang Xintao et al.), published in 2018.
In few words, image super-resolution (SR) techniques reconstruct a higher-resolution (HR) image or sequence from the observed lower-resolution (LR) images, e.g. upscaling of 720p image into 1080p.
One of the common approaches to solving this task is to use deep convolutional neural networks capable of recovering HR images from LR ones. And ESRGAN (Enhanced SRGAN) is one of them. Key points of ESRGAN:
- SRResNet-based architecture with residual-in-residual blocks;
- Mixture of context, perceptual, and adversarial losses. Context and perceptual losses are used for proper image upscaling, while adversarial loss pushes neural network to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.

Technologies
Catalystas pipeline runner for deep learning tasks. This new and rapidly developing library. can significantly reduce the amount of boilerplate code. If you are familiar with the TensorFlow ecosystem, you can think of Catalyst as Keras for PyTorch. This framework is integrated with logging systems such as the well-known TensorBoard;Pytorchandtorchvisionas main frameworks for deep learning;AlbumentationsandPIQfor data processing.
Quick Start
Setup environment
pip install git+https://github.com/leverxgroup/esrgan.git
Run an experiment
catalyst-dl run -C esrgan/config.yml --benchmark
where esrgan/config.yml is a path to the config file.
Results
Some examples of work of ESRGAN model trained on DIV2K dataset:
| LR</br>(low resolution) | ESRGAN</br>(original) | ESRGAN</br>(ours) | HR</br>(high resolution) | |:---:|:---:|:---:|:---:| | <img src="docs/_static/0853lr.png" height="128" width="128"/> | <img src="docs/_static/0853sr.png" height="128" width="128"/> | <img src="docs/_static/0853.png" height="128" width="128"/> | <img src="docs/_static/0853hr.png" height="128" width="128"/> | | <img src="docs/_static/0857lr.png" height="128" width="128"/> | <img src="docs/_static/0857sr.png" height="128" width="128"/> | <img src="docs/_static/0857.png" height="128" width="128"/> | <img src="docs/_static/0857hr.png" height="128" width="128"/> | | <img src="docs/_static/0887lr.png" height="128" width="128"/> | <img src="docs/_static/0887sr.png" height="128" width="128"/> | <img src="docs/_static/0887.png" height="128" width="128"/> | <img src="docs/_static/0887hr.png" height="128" width="128"/> |
Documentation
Full documentation for the project is available at https://esrgan.readthedocs.io/
License
esrgan is released under a CC BY-NC-ND 4.0 license. See LICENSE for additional details about it.
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
