DOVE
[NeurIPS'25] DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Install / Use
/learn @zhengchen1999/DOVEREADME
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang, "DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution", NeurIPS 2025
<div> <a href="https://github.com/zhengchen1999/DOVE/releases" target='_blank' style="text-decoration: none;"><img src="https://img.shields.io/github/downloads/zhengchen1999/DOVE/total?color=green&style=flat"></a> <a href="https://github.com/zhengchen1999/DOVE" target='_blank' style="text-decoration: none;"><img src="https://visitor-badge.laobi.icu/badge?page_id=zhengchen1999/DOVE"></a> <a href="https://github.com/zhengchen1999/DOVE/stargazers" target='_blank' style="text-decoration: none;"><img src="https://img.shields.io/github/stars/zhengchen1999/DOVE?style=social"></a> </div>[project] [arXiv] [supplementary material] [dataset] [pretrained models]
🔥🔥🔥 News
- 2025-12-11: Released the DOVE Stage-1 weight to facilitate training. 📦📦📦
- 2025-10-12: Training code and the HQ-VSR dataset have been released. 🚀🚀🚀
- 2025-10-11: The project page is online, containing more visual results. 🌈🌈🌈
- 2025-9-18: DOVE is accepted at NeurIPS 2025. 🎉🎉🎉
- 2025-6-09: Test datasets, inference scripts, and pretrained models are available. ⭐️⭐️⭐️
- 2025-5-22: This repo is released.
Abstract: Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampling acceleration techniques, particularly single-step, provide a potential solution. Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fidelity demands. To tackle the above issues, we propose DOVE, an efficient one-step diffusion model for real-world VSR. DOVE is obtained by fine-tuning a pretrained video diffusion model (i.e., CogVideoX). To effectively train DOVE, we introduce the latent–pixel training strategy. The strategy employs a two-stage scheme to gradually adapt the model to the video super-resolution task. Meanwhile, we design a video processing pipeline to construct a high-quality dataset tailored for VSR, termed HQ-VSR. Fine-tuning on this dataset further enhances the restoration capability of DOVE. Extensive experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods. It also offers outstanding inference efficiency, achieving up to a 28× speed-up over existing methods such as MGLD-VSR.

<table border="0" style="width: 100%; text-align: center; margin-top: 20px;"> <tr> <td> <video src="https://github.com/user-attachments/assets/4ad0ca78-6cca-48c0-95a5-5d5554093f7d" controls autoplay loop></video> </td> <td> <video src="https://github.com/user-attachments/assets/e5b5d247-28af-43fd-b32c-1f1b5896d9e7" controls autoplay loop></video> </td> </tr> </table>
Training Strategy

Video Processing Pipeline

🔖 TODO
- [x] Release testing code.
- [x] Release pre-trained models.
- [x] Release training code.
- [ ] Release the video processing pipeline.
- [x] Release HQ-VSR dataset.
- [x] Release project page.
- [ ] Provide WebUI.
- [ ] Provide HuggingFace demo.
⚙️ Dependencies
- Python 3.11
- PyTorch>=2.5.0
- Diffusers
# Clone the github repo and go to the default directory 'DOVE'.
git clone https://github.com/zhengchen1999/DOVE.git
conda create -n DOVE python=3.11
conda activate DOVE
pip install -r requirements.txt
pip install diffusers["torch"] transformers
pip install pyiqa
🔗 Contents
<a name="datasets"></a>📁 Datasets
🗳️ Train Datasets
We use two datasets for model training: HQ-VSR and DIV2K-HR. All datasets should be placed in the directory datasets/train/.
| Dataset | Type | # Videos / Images | Download | | ------------ | ----- | ----------------- | ------------------------------------------------------------ | | HQ-VSR | Video | 2,055 | Google Drive | | DIV2K-HR | Image | 800 | Official Link |
All datasets should follow this structure:
datasets/
└── train/
├── HQ-VSR/
└── DIV2K_train_HR/
💡 HQ-VSR description:
- Construct using our four-stage video processing pipeline.
- Extract 2,055 videos from OpenVid-1M, suitable for video super-resolution (VSR) training.
- Detailed configuration and statistics are provided in the paper.
🗳️ Test Datasets
We provide several real-world and synthetic test datasets for evaluation. All datasets follow a consistent directory structure:
| Dataset | Type | # Num | Download | | :------ | :--------: | :---: | :----------------------------------------------------------: | | UDM10 | Synthetic | 10 | Google Drive | | SPMCS | Synthetic | 30 | Google Drive | | YouHQ40 | Synthetic | 40 | Google Drive | | RealVSR | Real-world | 50 | Google Drive | | MVSR4x | Real-world | 15 | Google Drive | | VideoLQ | Real-world | 50 | Google Drive |
All datasets are hosted here. Make sure the path (datasets/test/) is correct before running inference.
The directory structure is as follows:
datasets/
└── test/
└── [DatasetName]/
├── GT/ # Ground Truth: folder of high-quality frames (one per clip)
├── GT-Video/ # Ground Truth (video version): lossless MKV format
├── LQ/ # Low-quality Input: folder of degraded frames (one per clip)
└── LQ-Video/ # Low-Quality Input (video version): lossless MKV format
<a name="models"></a>📦 Models
We provide pretrained weights for DOVE and DOVE-2B.
| Model Name | Description | HuggingFace | Google Drive | Baidu Disk | Visual Results | | :-------------------- | :----------------------------------------------: | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: | ------------------------------------------------------------ | | DOVE (Stage-1) | Base version, built on CogVideoX1.5-5B, Stage-1; | TODO | Download | TODO | TODO | | DOVE (Stage-2, Final) | Base version, built on CogVideoX1.5-5B, Stage-2; | TODO | Download | Download | Download | | DOVE-2B | Smaller version, based on CogVideoX-2B | TODO | TODO | TODO | TODO |
Place downloaded model files into the
pretrained_models/folder, e.g.,pretrained_models/DOVE.
<a name="training"></a>🔧 Training
Note: Training requires 4×A100 GPUs (80 GB each). You can optionally reduce the number of GPUs and use LoRA fine-tuning to reduce GPU memory requirements.
-
Prepare Datasets and Pretrained Models. Download the following resources and place them in the specified directories:
| Type | Dataset / Model | Path | | ----------------------- | ------------------------------------------------------------ | -------------
Related Skills
docs-writer
98.8k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
331.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
arscontexta
2.8kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
