StableVSR

[ECCV 2024] Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

Generate Convert Improve

Install / Use

/learn @claudiom4sir/StableVSR

About this skill

Quality Score

0/100

README

Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models (ECCV 2024)

Claudio Rota, Marco Buzzelli, Joost van de Weijer

[Paper] [arXiv] [Poster]

Abstract

In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR.

Method overview

Usage

Environment

The code is based on Python 3.8.17, CUDA 11, and diffusers.

Conda setup

conda create -n stablevsr python=3.8.17 -y
git clone https://github.com/claudiom4sir/StableVSR.git
cd StableVSR
conda activate stablevsr
pip install -r requirements.txt

Datasets

Download the REDS dataset from here (sharp + low-resolution). Data are expected to be in the format root/hr/sequences/frames and root/lr/sequences/frames.

Pretrained models

Pretrained models are available here. If you run the train or test code, you don't need to download them explicitly as they are fetched with .from_pretrained('claudiom4sir/StableVSR').

Train

Adjust the dataroot options in dataset/config_reds.yaml. Then, adjust the options in train.sh. Use the following command to start training:

bash ./train.sh

Test

python test.py --in_path YOUR_PATH_TO_LR_SEQS --out_path YOUR_OUTPUT_PATH --num_inference_steps 50 --controlnet_ckpt YOUR_PATH_TO_CONTROLNET_CKPT_FOLDER

Evaluation

python eval.py --gt_path YOUR_PATH_TO_GT_SEQS --out_path YOUR_OUTPUT_PATH

Memory requirements

Training with the provided configuration requires about 17GB GPU. Evaluation on REDS (320x180 -> 1280x720) about 14.5 GB.

Demo video

https://github.com/user-attachments/assets/60c5fc3b-819c-4242-bd73-e5e3b0f7beb3

https://github.com/user-attachments/assets/9fbc6fad-a088-41d9-be38-af53a8206916

https://github.com/user-attachments/assets/2f8a36f7-3b50-4eb1-baa8-e914a8931543

https://github.com/user-attachments/assets/7b379ad5-ecba-468a-811a-0a9cc4c8456d

Citations

@inproceedings{rota2024enhancing,
  title={Enhancing perceptual quality in video super-resolution through temporally-consistent detail synthesis using diffusion models},
  author={Rota, Claudio and Buzzelli, Marco and van de Weijer, Joost},
  booktitle={European Conference on Computer Vision},
  pages={36--53},
  year={2024},
  organization={Springer}
}

Contacts

If you have any questions, please contact me at claudio.rota@unimib.it

Related Skills

qqbot-channel

352.9k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.7k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

352.9k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

3.1k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

claudiom4sir

View profile

View on GitHub

GitHub Stars177

CategoryContent

Updated7d ago

Forks7

claudiom4sir/StableVSR

Languages

Python

Security Score

100/100

Audited on Apr 2, 2026

No findings