RVRT

Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurlPS2022, official repository)

Generate Convert Improve

Install / Use

/learn @JingyunLiang/RVRT

About this skill

Quality Score

0/100

README

Recurrent Video Restoration Transformer with Guided Deformable Attention (RVRT, NeurlPS2022)

arxiv | supplementary | pretrained models | visual results

This repository is the official PyTorch implementation of "Recurrent Video Restoration Transformer with Guided Deformable Attention" (arxiv, supp, pretrained models, visual results). RVRT achieves state-of-the-art performance with balanced model size, testing memory and runtime in

video SR (REDS, Vimeo90K, Vid4, UDM10)
video deblurring (GoPro, DVD)
video denoising (DAVIS, Set8)

Eg1 Eg2 Eg3

:rocket: :rocket: :rocket: News:

June. 8, 2022: See more related works in vision restoration as follows:

| Topic | Title | |:-----------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | transformer-based image/video restoration:fire: | SwinIR: Image Restoration Using Swin Transformer VRT: A Video Restoration Transformer | | real-world image SR/denoising | Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, ICCV2021 | | | blind image SR | Flow-based Kernel Prior with Application to Blind Super-Resolution, CVPR2021 Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution, ICCV2021 | | generative image SR and image rescaling | Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling, ICCV2021 |

Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.

Requirements
Quick Testing
Training
Results
Citation
License and Acknowledgement

Requirements

Python 3.8, PyTorch >= 1.9.1

Requirements: see requirements.txt

Platforms: Ubuntu 18.04, cuda-11.1

Quick Testing

Following commands will download pretrained models and test datasets automatically (except Vimeo-90K testing set). If out-of-memory, try to reduce --tile at the expense of slightly decreased performance.

You can also try to test it on Colab <a href="https://colab.research.google.com/gist/JingyunLiang/23502e2c65d82144219fa3e3322e4fc3/rvrt-demo-on-video-restoration.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>, but the results may be slightly different due to --tile difference.

# download code
git clone https://github.com/JingyunLiang/RVRT
cd RVRT
pip install -r requirements.txt

# 001, video sr trained on REDS, tested on REDS4
python main_test_rvrt.py --task 001_RVRT_videosr_bi_REDS_30frames --folder_lq testsets/REDS4/sharp_bicubic --folder_gt testsets/REDS4/GT --tile 100 128 128 --tile_overlap 2 20 20

# 002, video sr trained on Vimeo (bicubic), tested on Vid4 and Vimeo
python main_test_rvrt.py --task 002_RVRT_videosr_bi_Vimeo_14frames --folder_lq testsets/Vid4/BIx4 --folder_gt testsets/Vid4/GT --tile 0 0 0 --tile_overlap 2 20 20
python main_test_rvrt.py --task 002_RVRT_videosr_bi_Vimeo_14frames --folder_lq testsets/vimeo90k/vimeo_septuplet_matlabLRx4/sequences --folder_gt testsets/vimeo90k/vimeo_septuplet/sequences --tile 0 0 0 --tile_overlap 0 20 20

# 003, video sr trained on Vimeo (blur-downsampling), tested on Vid4, UDM10 and Vimeo
python main_test_rvrt.py --task 003_RVRT_videosr_bd_Vimeo_14frames --folder_lq testsets/Vid4/BDx4 --folder_gt testsets/Vid4/GT --tile 0 0 0 --tile_overlap 2 20 20
python main_test_rvrt.py --task 003_RVRT_videosr_bd_Vimeo_14frames --folder_lq testsets/UDM10/BDx4 --folder_gt testsets/UDM10/GT --tile 0 0 0 --tile_overlap 2 20 20
python main_test_rvrt.py --task 003_RVRT_videosr_bd_Vimeo_14frames --folder_lq testsets/vimeo90k/vimeo_septuplet_BDLRx4/sequences --folder_gt testsets/vimeo90k/vimeo_septuplet/sequences --tile 0 0 0 --tile_overlap 0 20 20

# 004, video deblurring trained and tested on DVD
python main_test_rvrt.py --task 004_RVRT_videodeblurring_DVD_16frames --folder_lq testsets/DVD10/test_GT_blurred --folder_gt testsets/DVD10/test_GT --tile 0 256 256 --tile_overlap 2 20 20

# 005, video deblurring trained and tested on GoPro
python main_test_rvrt.py --task 005_RVRT_videodeblurring_GoPro_16frames --folder_lq testsets/GoPro11/test_GT_blurred --folder_gt testsets/GoPro11/test_GT --tile 0 256 256 --tile_overlap 2 20 20

# 006, video denoising tra

Related Skills

docs-writer

98.7k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

330.3k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

Assume the personality of the Persona described in any of the document available in the @~/.ai/personas directory.

JingyunLiang

View profile

View on GitHub

GitHub Stars435

CategoryContent

Updated9d ago

Forks43

JingyunLiang/RVRT

Languages

Python

Security Score

85/100

Audited on Mar 13, 2026

No findings