DreamClear
[NeurIPS 2024] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Install / Use
/learn @shallowdream204/DreamClearREADME
⭐ If DreamClear is helpful to your projects, please help star this repo. Thanks! 🤗
</div> <be>🔥 News
- 2024.11.30: Release more convenient inference code for your own images.
- 2024.10.25: Release segmentation&detection code, pre-trained models.
- 2024.10.25: Release
RealLQ250benchmark, which contains 250 real-world LQ images. - 2024.10.25: Release training&inference code, pre-trained models of DreamClear.
- 2024.10.24: This repo is created.
📸 Real-World IR Results
<img src="assets/wukong.png" height="400px"/> <img src="assets/cat.png" height="400px"/> <img src="assets/person.png" height="400px"/> <img src="assets/sheep.png" height="400px"/> <img src="assets/tree.png" height="400px"/> <img src="assets/flower.png" height="400px"/>
🔧 Dependencies and Installation
-
Clone this repo and navigate to DreamClear folder
git clone https://github.com/shallowdream204/DreamClear.git cd DreamClear -
Create Conda Environment and Install Package
conda create -n dreamclear python=3.9 -y conda activate dreamclear pip3 install -r requirements.txt -
Download Pre-trained Models (All models except for llava can be downloaded at Huggingface for convenience.)
Base Model:
PixArt-α-1024: PixArt-XL-2-1024-MS.pthVAE: sd-vae-ft-emaT5 Text Encoder: t5-v1_1-xxlLLaVA: llava-v1.6-vicuna-13bSwinIR: general_swinir_v1.ckpt
Ours provided Model:
DreamClear: DreamClear-1024.pthRMT for Segmentation: rmt_uper_s_2x.pthRMT for Detection: rmt_maskrcnn_s_1x.pth
🎰 Train
I - Prepare training data
Similar to SeeSR, We pre-prepare HQ-LQ image pairs for the training of IR model. Run the following command to make paired data for training:
python3 tools/make_paired_data.py \
--gt_path gt_path1 gt_path2 ... \
--save_dir /path/to/save/folder/ \
--epoch 1 # number of epochs to generate paired data
After generating paired data, you can use MLLM (e.g., LLaVA) to generate detailed text prompt for HQ images. Then you need to use T5 to extract text features in order to save training time. Run:
python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder
Finally, the directory structure for training datasets should look like
training_datasets_folder/
└── gt
└── 0000001.png # GT , (1024, 1024, 3)
└── ...
└── sr_bicubic
└── 0000001.png # LQ + bicubic upsample, (1024, 1024, 3)
└── ...
└── caption
└── 0000001.txt # Caption files (not used in training)
└── ...
└── npz
└── 0000001.npz # T5 features
└── ...
II - Training for DreamClear
Run the following command to train DreamClear with default settings:
python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... \
train_dreamclear.py configs/DreamClear/DreamClear_Train.py \
--load_from /path/to/PixArt-XL-2-1024-MS.pth \
--vae_pretrained /path/to/sd-vae-ft-ema \
--swinir_pretrained /path/to/general_swinir_v1.ckpt \
--val_image /path/to/RealLQ250/lq/val_image.png \
--val_npz /path/to/RealLQ250/npz/val_image.npz \
--work_dir experiments/train_dreamclear
Please modify the path of training datasets in configs/DreamClear/DreamClear_Train.py. You can also modify the training hyper-parameters (e.g., lr, train_batch_size, gradient_accumulation_steps) in this file, according to your own GPU machines.
⚡ Inference
We provide the RealLQ250 benchmark, which can be downloaded from Google Drive.
Testing DreamClear for Image Restoration
Run the following command to restore LQ images (the code defaults to using 2 GPUs for inference):
python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
test.py configs/DreamClear/DreamClear_Test.py \
--dreamclear_ckpt /path/to/DreamClear-1024.pth \
--swinir_ckpt /path/to/general_swinir_v1.ckpt \
--vae_ckpt /path/to/sd-vae-ft-ema \
--t5_ckpt /path/to/t5-v1_1-xxl \
--llava_ckpt /path/to/llava-v1.6-vicuna-13b \
--lre --cfg_scale 4.5 --color_align wavelet \
--image_path /path/to/input/images \
--save_dir validation \
--mixed_precision fp16 \
--upscale 4
Evaluation on high-level benchmarks
Testing instructions for segmentation and detection can be found in their respective folders.
🪪 License
The provided code and pre-trained weights are licensed under the Apache 2.0 license.
🤗 Acknowledgement
This code is based on PixArt-α, BasicSR and RMT. Some code are brought from SeeSR, StableSR, DiffBIR and LLaVA. We thank the authors for their awesome work.
📧 Contact
If you have any questions, please feel free to reach me out at shallowdream555@gmail.com.
📖 Citation
If you find our work useful for your research, please consider citing our paper:
@article{ai2024dreamclear,
title={DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation},
author={Ai, Yuang and Zhou, Xiaoqiang and Huang, Huaibo and Han, Xiaotian and Chen, Zhengyu and You, Quanzeng and Yang, Hongxia},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={55443--55469},
year={2024}
}
Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.8kCommit, push, and open a PR
