SlotDiffusion
Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models
Install / Use
/learn @Wuziyi616/SlotDiffusionREADME
SlotDiffusion
SlotDiffusion: Unsupervised Object-Centric Learning with Diffusion Models<br/> Ziyi Wu, Jingyu Hu*, Wuyue Lu*, Igor Gilitschenski, Animesh Garg<br/> NeurIPS'23 | GitHub | arXiv | Project page
Unsupervised Video Object Segmentation
<table style="width: 50%;"> <tr> <td style="width: 33%; text-align: center;"> Video </td> <td style="width: 33%; text-align: center;"> GT </td> <td style="width: 33%; text-align: center;"> Ours </td> </tr> <tr> <td colspan="3"> <img src="./assets/MOVi-D-seg-578.gif" alt="MOVi-D seg" width="100%"> </td> </tr> <tr> <td colspan="3"> <img src="./assets/MOVi-E-seg-31.gif" alt="MOVi-E seg" width="100%"> </td> </tr> </table>Slot-based Image Editing
<img src="./assets/CLEVRTex-edit.png" width="70%">Introduction
This is the official PyTorch implementation for paper: SlotDiffusion: Unsupervised Object-Centric Learning with Diffusion Models. The code contains:
- SOTA unsupervised object-centric models, Slot Attention, SAVi, SLATE, STEVE, and SlotDiffusion
- Unsupervised object segmentation, image/video reconstruction, compositional generation on 6 datasets
- Video prediction and VQA on Physion dataset
- Scale up to real-world datasets: PASCAL VOC and COCO
Update
- 2023.9.21: The paper is accepted by NeurIPS 2023 as a Spotlight presentation!
- 2023.5.24: Initial code release!
Installation
Please refer to install.md for step-by-step guidance on how to install the packages.
Experiments
This codebase is tailored to Slurm GPU clusters with preemption mechanism. For the configs, we mainly use A40 with 40GB memory (though many experiments don't require so much memory). Please modify the code accordingly if you are using other hardware settings:
- Please go through
scripts/train.pyand change the fields marked byTODO: - Please read the config file for the model you want to train. We use DDP with multiple GPUs to accelerate training. You can use less GPUs to achieve a better memory-speed trade-off
Dataset Preparation
Please refer to data.md for dataset downloading and pre-processing.
Reproduce Results
Please see benchmark.md for detailed instructions on how to reproduce our results in the paper.
Possible Issues
See the troubleshooting section of nerv for potential issues.
Please open an issue if you encounter any errors running the code.
Citation
Please cite our paper if you find it useful in your research:
@article{wu2023slotdiffusion,
title={SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models},
author={Wu, Ziyi and Hu, Jingyu and Lu, Wuyue and Gilitschenski, Igor and Garg, Animesh},
journal={NeurIPS},
year={2023}
}
Acknowledgement
We thank the authors of Slot Attention, slot_attention.pytorch, SAVi, SLATE, STEVE, Latent Diffusion Models, DPM-Solver, DINOSAUR, MaskContrast and SlotFormer for opening source their wonderful works.
License
SlotDiffusion is released under the MIT License. See the LICENSE file for more details.
Contact
If you have any questions about the code, please contact Ziyi Wu dazitu616@gmail.com
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.9kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
