QDM
QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution
Install / Use
/learn @linYDTHU/QDMREADME
QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution
<div align="center"> <a href="https://arxiv.org/abs/2503.12015"> <img src="https://img.shields.io/badge/arXiv-2503.12015-b31b1b.svg" alt="arXiv"> </a> <img src="https://visitor-badge.laobi.icu/badge?page_id=linYDTHU/QDM" alt="visitors"> <a href="https://github.com/linYDTHU/QDM"> <img src="https://img.shields.io/github/stars/linYDTHU/QDM?affiliations=OWNER&color=green&style=social" alt="GitHub Stars"> </a> </div>If you've found QDM useful for your research or projects, please show your support by ⭐ in this repo. Thanks!
Deep learning-based super-resolution (SR) methods often perform pixel-wise computations uniformly across entire images, even in homogeneous regions where high-resolution refinement is redundant. We propose the Quadtree Diffusion Model (QDM), a region-adaptive diffusion framework that leverages a quadtree structure to selectively enhance detail-rich regions while reducing computations in homogeneous areas. By guiding the diffusion with a quadtree derived from the low-quality input, QDM identifies key regions—represented by leaf nodes—where fine detail is essential and applies minimal refinement elsewhere. This mask-guided, two-stream architecture adaptively balances quality and efficiency, producing high-fidelity outputs with low computational redundancy. Experiments demonstrate QDM’s effectiveness in high-resolution SR tasks across diverse image types, particularly in medical imaging (e.g., CT scans), where large homogeneous regions are prevalent. Furthermore, QDM outperforms or is comparable to state-of-the-art SR methods on standard benchmarks while significantly reducing computational costs, highlighting its efficiency and suitability for resource-limited environments. <img src="./assets/Quadtree_Diagram.png" align="middle" width="1000">
Update
- 2025.11.18: Released a new arXiv version with tumor region reconstruction and real-world SR results. Refer to the paper for details. Use
print_roi_metrics.pyto replicate the tumor reconstruction results. Access results for all methods here. Updated real-world SR with Gaussian-weighted patch-level aggregation as per this reference inutils/util_image.py. - 2025.03.18: Release codes & pretrained checkpoints, and update README.
- 2025.03.14: Create this repo.
Requirements
- More detail (See requirements.txt)
A suitable conda environment named
quadtree_diffusioncan be created and activated with:
conda create -n quadtree_diffusion python=3.10
conda activate quadtree_diffusion
pip install -r requirements.txt
Examples
Real-World Image Super-Resolution
<img src="assets/realsr_1.png" height="330px"/> <img src="assets/realsr_2.png" height="330px"/> <img src="assets/realsr_5.png" height="330px"/>
<img src="assets/realsr_3.png" height="320px"/> <img src="assets/realsr_4.png" height="320px"/> <img src="assets/realsr_6.png" height="320px"/>
Medical Image Super-Resolution
<img src="assets/medx8_sr_1.png" height="330"/> <img src="assets/medx8_sr_2.png" height="330px"/> <img src="assets/medx8_sr_3.png" height="330px"/>
Fast Testing Guide
Download Pretrained Checkpoints
First-Stage Models (Autoencoders)
- Real-world SR Task: Download Link
- Medical SR Task: Download Link
Note: Place the downloaded models in the weights directory.
QDM-L Checkpoints
We provide pretrained checkpoints for the QDM-L model for the following tasks:
Note: Ensure all downloaded weights are placed in the weights directory.
Inference
🚀 Multi-GPU Acceleration
If you have multiple GPUs available, you can accelerate the inference process using the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 inference.py \
-i [Input Directory or Image] \
-o [Output Dir] \
--seed [Seed] \
--chop_bs [Chopping Batch Size] \
--chop_size [Chopping Size] \
--cfg_path [Config Path] \
--ckpt_path [Checkpoint Path] \
--distributed
💻 Single-GPU Execution
python inference.py \
-i [Input Directory or Image] \
-o [Output Dir] \
--seed [Seed] \
--chop_bs [Chopping Batch Size] \
--chop_size [Chopping Size] \
--cfg_path [Config Path] \
--ckpt_path [Checkpoint Path]
🔧Configuration Tips
- When processing very large images, you can adjust
--chop_bsto balance efficiency and memory usage. - We provide multiple configuration files for different tasks in the
configs/inferencedirectory. Make sure to select the appropriate configuration file for your specific task. - You can add
--processargument to output the mask guided diffusion process demonstrated in the paper.
Training
Preparing Stage
This repository supports two super-resolution (SR) tasks: Real-World SR and Medical CT SR. Follow the steps below to prepare the necessary training and testing datasets.
Real-World SR Task
We integrate training data from six established benchmarks:
- LSDIR – Access Dataset
- DIV2K – Access Dataset
- DIV8K – Access Dataset
- OutdoorSceneTraining – Access Dataset
- Flicker2K – Access Dataset
- FFHQ Subset – A curated selection of 10,000 facial images from the FFHQ dataset
Preprocessing Steps
- Filtering OutdoorSceneTraining:
Filter out images with spatial dimensions smaller than 512 pixels. Update the directory path inside the script as needed, then run:python scripts/filter_images.py - Synthetic LSDIR_TEST:
Download the pre-synthesized LSDIR_TEST dataset from this link or generate your own by running:python scripts/prepare_lsdir_test.py
Medical CT SR Task
For the medical CT super-resolution task, we utilize clinical CT scans from two well-established segmentation challenges: HaN-Seg and SegRap2023. Download the datasets using the following links:
Training Scripts
You can start your training process via running:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path [Config Path] --save_dir [Logging Folder]
We provide multiple configuration files for different tasks in the configs/train directory.
Citations
Please consider citing our paper in your publications if it helps. Here is the bibtex:
@misc{yang2025qdmquadtreebasedregionadaptivesparse,
title={QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution},
author={Donglin Yang and Paul Vicol and Xiaojuan Qi and Renjie Liao and Xiaofan Zhang},
year={2025},
eprint={2503.12015},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.12015},
}
License
This project is licensed under <a rel="license" href="./LICENSE">MIT License</a>. Redistribution and use should follow this license.
Acknowledgement
This project is primarily based on ResShift and LDM. We also adopt Real-ESRGAN to synthesize the LR/HR pairs. We design QDM mainly based on DiT. Thanks for their awesome works.
Contact
If you have any questions, please feel free to contact me via ydlin718@gmail.com.
