DreamClear

[NeurIPS 2024] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Generate Convert Improve

Install / Use

/learn @shallowdream204/DreamClear

About this skill

Quality Score

0/100

README

<div align="center"> <div class="logo"> <img src="assets/logo.png" style="width:180px"> </a> </div> <h1>DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation</h1> <div> <a href='https://scholar.google.com/citations?user=2Qp7Y5kAAAAJ' target='_blank'>Yuang Ai</a><sup>1,2</sup>&emsp; <a href='https://scholar.google.com/citations?user=Z2BTkNIAAAAJ' target='_blank'>Xiaoqiang Zhou</a><sup>1,4</sup>&emsp; <a href='https://scholar.google.com/citations?user=XMvLciUAAAAJ' target='_blank'>Huaibo Huang</a><sup>1,2</sup>&emsp; <a href='https://scholar.google.com/citations?user=5fHHi24AAAAJ' target='_blank'>Xiaotian Han</a><sup>3</sup>&emsp; <a href='https://scholar.google.com/citations?user=0F1u21sAAAAJ' target='_blank'>Zhengyu Chen</a><sup>3</sup>&emsp; <a href='https://scholar.google.com/citations?user=c5KJsIgAAAAJ' target='_blank'>Quanzeng You</a><sup>3</sup>&emsp; <a href='https://scholar.google.com/citations?user=iJlC5mMAAAAJ' target='_blank'>Hongxia Yang</a><sup>3</sup> </div> <div> <sup>1</sup>MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences&emsp;<br> <sup>2</sup>School of Artificial Intelligence, University of Chinese Academy of Sciences&emsp;<br> <sup>3</sup>ByteDance, Inc <sup>4</sup>University of Science and Technology of China&emsp; </div> <div> </div> <div> <strong>NeurIPS 2024</strong> </div> <div> <h4 align="center"> <a href="https://arxiv.org/abs/2410.18666" target='_blank'> <img src="https://img.shields.io/badge/arXiv%20paper-2410.18666-b31b1b.svg"> </a> <a href="https://huggingface.co/shallowdream204/DreamClear/tree/main" target='_blank'> <img src="https://img.shields.io/badge/🤗%20Weights-DreamClear-yellow"> </a> <img src="https://visitor-badge.laobi.icu/badge?page_id=shallowdream204/DreamClear"> </h4> </div>

⭐ If DreamClear is helpful to your projects, please help star this repo. Thanks! 🤗

</div> <be>

🔥 News

2024.11.30: Release more convenient inference code for your own images.
2024.10.25: Release segmentation&detection code, pre-trained models.
2024.10.25: Release RealLQ250 benchmark, which contains 250 real-world LQ images.
2024.10.25: Release training&inference code, pre-trained models of DreamClear.
2024.10.24: This repo is created.

📸 Real-World IR Results

🔧 Dependencies and Installation

Clone this repo and navigate to DreamClear folder

git clone https://github.com/shallowdream204/DreamClear.git
cd DreamClear

Create Conda Environment and Install Package

conda create -n dreamclear python=3.9 -y
conda activate dreamclear
pip3 install -r requirements.txt

Download Pre-trained Models (All models except for llava can be downloaded at Huggingface for convenience.)

Base Model:
- PixArt-α-1024: PixArt-XL-2-1024-MS.pth
- VAE: sd-vae-ft-ema
- T5 Text Encoder: t5-v1_1-xxl
- LLaVA: llava-v1.6-vicuna-13b
- SwinIR: general_swinir_v1.ckpt
Ours provided Model:
- DreamClear: DreamClear-1024.pth
- RMT for Segmentation: rmt_uper_s_2x.pth
- RMT for Detection: rmt_maskrcnn_s_1x.pth

🎰 Train

I - Prepare training data

Similar to SeeSR, We pre-prepare HQ-LQ image pairs for the training of IR model. Run the following command to make paired data for training:

python3 tools/make_paired_data.py \
--gt_path gt_path1 gt_path2 ... \ 
--save_dir /path/to/save/folder/ \
--epoch 1 # number of epochs to generate paired data

After generating paired data, you can use MLLM (e.g., LLaVA) to generate detailed text prompt for HQ images. Then you need to use T5 to extract text features in order to save training time. Run:

python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder

Finally, the directory structure for training datasets should look like

training_datasets_folder/
    └── gt
        └── 0000001.png # GT , (1024, 1024, 3)
        └── ...
    └── sr_bicubic
        └── 0000001.png # LQ + bicubic upsample, (1024, 1024, 3)
        └── ...
    └── caption
        └── 0000001.txt # Caption files (not used in training)
        └── ...
    └── npz
        └── 0000001.npz # T5 features
        └── ...

II - Training for DreamClear

Run the following command to train DreamClear with default settings:

python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... \
    train_dreamclear.py configs/DreamClear/DreamClear_Train.py \
    --load_from /path/to/PixArt-XL-2-1024-MS.pth \
    --vae_pretrained /path/to/sd-vae-ft-ema \
    --swinir_pretrained /path/to/general_swinir_v1.ckpt \
    --val_image /path/to/RealLQ250/lq/val_image.png \
    --val_npz /path/to/RealLQ250/npz/val_image.npz \
    --work_dir experiments/train_dreamclear

Please modify the path of training datasets in configs/DreamClear/DreamClear_Train.py. You can also modify the training hyper-parameters (e.g., lr, train_batch_size, gradient_accumulation_steps) in this file, according to your own GPU machines.

⚡ Inference

We provide the RealLQ250 benchmark, which can be downloaded from Google Drive.

Testing DreamClear for Image Restoration

Run the following command to restore LQ images (the code defaults to using 2 GPUs for inference):

python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
    test.py configs/DreamClear/DreamClear_Test.py \
    --dreamclear_ckpt /path/to/DreamClear-1024.pth \
    --swinir_ckpt /path/to/general_swinir_v1.ckpt \
    --vae_ckpt /path/to/sd-vae-ft-ema \
    --t5_ckpt /path/to/t5-v1_1-xxl \
    --llava_ckpt /path/to/llava-v1.6-vicuna-13b \
    --lre --cfg_scale 4.5 --color_align wavelet \
    --image_path /path/to/input/images \
    --save_dir validation \
    --mixed_precision fp16 \
    --upscale 4

Evaluation on high-level benchmarks

Testing instructions for segmentation and detection can be found in their respective folders.

🪪 License

The provided code and pre-trained weights are licensed under the Apache 2.0 license.

🤗 Acknowledgement

This code is based on PixArt-α, BasicSR and RMT. Some code are brought from SeeSR, StableSR, DiffBIR and LLaVA. We thank the authors for their awesome work.

📧 Contact

If you have any questions, please feel free to reach me out at shallowdream555@gmail.com.

📖 Citation

If you find our work useful for your research, please consider citing our paper:

@article{ai2024dreamclear,
    title={DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation},
    author={Ai, Yuang and Zhou, Xiaoqiang and Huang, Huaibo and Han, Xiaotian and Chen, Zhengyu and You, Quanzeng and Yang, Hongxia},
    journal={Advances in Neural Information Processing Systems},
    volume={37},
    pages={55443--55469},
    year={2024}
}

Related Skills

node-connect

339.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

339.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.8k

Commit, push, and open a PR

shallowdream204

View profile

View on GitHub

GitHub Stars1.2k

CategoryDevelopment

Updated3d ago

Forks45

shallowdream204/DreamClear

Languages

Python

Security Score

100/100

Audited on Mar 25, 2026

No findings