PixelHacker
PixelHacker: Image Inpainting with Structural and Semantic Consistency
Install / Use
/learn @hustvl/PixelHackerREADME
SOTA performance on Places2, CelebA-HQ, and FFHQ & Superior structural and semantic consistency
Ziyang Xu<sup>1</sup>, Kangsheng Duan<sup>1</sup>, Xiaolei Shen<sup>2</sup>, Zhifeng Ding<sup>2</sup>, Wenyu Liu<sup>1</sup>, Xiaohu Ruan<sup>2</sup>,
Xiaoxin Chen<sup>2</sup>, Xinggang Wang<sup>1 :email:</sup>
(<sup>:email:</sup>) Corresponding Author.
<sup>1</sup> Huazhong University of Science and Technology. <sup>2</sup> VIVO AI Lab.
</div><img src="./assets/Pipeline.png"></img>
🌟Highlights
- Latent Categories Guidance (LCG): Simple yet effective inpainting paradigm with superior structural and semantic consistency. Let's advance inpainting research to challenge more complex scenarios!
- PixelHacker: Diffusion-based inpainting model trained with LCG, outperforming SOTA performance across multiple natural-scene (Places2) and human-face (CelebA-HQ, and FFHQ) benchmarks!
- Comprehensive SOTA Performance:
- Places2 (Natural Scene)
- Evaluated at 512 resolution using 10k test set images with 40-50% masked regions, PixelHacker achieved the best performance with FID 8.59 and LPIPS 0.2026.
- Evaluated at 512 resolution using 36.5k validation set images with large and small mask settings, PixelHacker achieved the best performance on FID (large: 2.05, small: 0.82) and U-IDS (large:36.07, small:42.21), and the second best performance on LPIPS (large:0.169, small:0.088).
- Evaluated at 256 and 512 resolutions using validation set images with a highly randomised masking strategy, PixelHacker achieved the best performance at 512 resolution with FID 5.75 and LPIPS 0.305, and the second best performance at 256 resolution with FID 9.25 and LPIPS 0.367.
- CelebA-HQ (Human-Face Scene)
- Evaluated at 512 resolution, PixelHacker achieved the best performance with FID 4.75 and LPIPS 0.115.
- FFHQ (Human-Face Scene)
- Evaluated at 256 resolution, PixelHacker achieved the best performance with FID 6.35 and LPIPS 0.229.
- Places2 (Natural Scene)
🔥Updates
May 20, 2025: 🔥 We have released the code and weights. The weights include the pretrained and all fine-tuned versions, each only 0.8B params. Feel free to play!May 1, 2025: 🔥 We have released the project page with 63+ demos on natural and human-face scenes. Have fun! 🤗April 30, 2025: 🔥 We have released the arXiv paper for PixelHacker. The code and project page will be released soon.
🏕️Performance on Natural Scene
<div align="center"> <img src="./assets/Demo1.gif" width="360px"></img> </div><img src="./assets/Cover.png"></img>
<img src="./assets/Natural-Scene.png"></img>
🤗Performance on Human-Face Scene
<div align="center"> <img src="./assets/Demo2.gif" width="360px"></img> </div><img src="./assets/Human-Face.png"></img>
📦Environment Setups
- torch 2.3.0
- transformers 4.40.0
- diffusers 0.30.2
- See 'requirements.txt' for detailed Python libraries required
conda create -n pixelhacker python=3.10
conda activate pixelhacker
# cd /xx/xx/PixelHacker
pip install -r requirements.txt
🗃️Model Checkpoints
-
Download the checkpoint of VAE and put it into ../PixelHacker/vae.
-
Download the checkpoints of pretrained version, fine-tuned version (places2), fine-tuned version (celeba-hq), fine-tuned version (ffhq), and put them into ../PixelHacker/weight.
-
Finally, the detailed organizational form is as follows:
├── PixelHacker
│ ├── weight
│ ├── pretrained
│ ├── diffusion_pytorch_model.bin
│ ├── ft_places2
│ ├── diffusion_pytorch_model.bin
│ ├── ft_celebahq
│ ├── diffusion_pytorch_model.bin
│ ├── ft_ffhq
│ ├── diffusion_pytorch_model.bin
│ ├── vae
│ ├── config.json
│ ├── diffusion_pytorch_model.bin
│ ├── ...
🔮Inference
You can run the following code directly to get the inpainting result of the example image-mask pair, and the result will be generated in ../PixelHacker/outputs. If you want to infer on custom data, just place the image and mask with the same name in ../PixelHacker/imgs and ../PixelHacker/masks, respectively, then run the following code as well.
python infer_pixelhacker.py \
--config config/PixelHacker_sdvae_f8d4.yaml \
--weight weight/ft_places/diffusion_pytorch_model.bin
🎓Citation
@misc{xu2025pixelhacker,
title={PixelHacker: Image Inpainting with Structural and Semantic Consistency},
author={Ziyang Xu and Kangsheng Duan and Xiaolei Shen and Zhifeng Ding and Wenyu Liu and Xiaohu Ruan and Xiaoxin Chen and Xinggang Wang},
year={2025},
eprint={2504.20438},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.20438},
}
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
