SlotDiffusion

Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models

Generate Convert Improve

Install / Use

/learn @Wuziyi616/SlotDiffusion

About this skill

Quality Score

0/100

README

SlotDiffusion

SlotDiffusion: Unsupervised Object-Centric Learning with Diffusion Models<br/> Ziyi Wu, Jingyu Hu*, Wuyue Lu*, Igor Gilitschenski, Animesh Garg<br/> NeurIPS'23 | GitHub | arXiv | Project page

Unsupervised Video Object Segmentation

<table style="width: 50%;"> <tr> <td style="width: 33%; text-align: center;"> Video </td> <td style="width: 33%; text-align: center;"> GT     </td> <td style="width: 33%; text-align: center;"> Ours   </td> </tr> <tr> <td colspan="3"> <img src="./assets/MOVi-D-seg-578.gif" alt="MOVi-D seg" width="100%"> </td> </tr> <tr> <td colspan="3"> <img src="./assets/MOVi-E-seg-31.gif" alt="MOVi-E seg" width="100%"> </td> </tr> </table>

Slot-based Image Editing

Introduction

This is the official PyTorch implementation for paper: SlotDiffusion: Unsupervised Object-Centric Learning with Diffusion Models. The code contains:

SOTA unsupervised object-centric models, Slot Attention, SAVi, SLATE, STEVE, and SlotDiffusion
Unsupervised object segmentation, image/video reconstruction, compositional generation on 6 datasets
Video prediction and VQA on Physion dataset
Scale up to real-world datasets: PASCAL VOC and COCO

Update

2023.9.21: The paper is accepted by NeurIPS 2023 as a Spotlight presentation!
2023.5.24: Initial code release!

Installation

Please refer to install.md for step-by-step guidance on how to install the packages.

Experiments

This codebase is tailored to Slurm GPU clusters with preemption mechanism. For the configs, we mainly use A40 with 40GB memory (though many experiments don't require so much memory). Please modify the code accordingly if you are using other hardware settings:

Please go through scripts/train.py and change the fields marked by TODO:
Please read the config file for the model you want to train. We use DDP with multiple GPUs to accelerate training. You can use less GPUs to achieve a better memory-speed trade-off

Dataset Preparation

Please refer to data.md for dataset downloading and pre-processing.

Reproduce Results

Please see benchmark.md for detailed instructions on how to reproduce our results in the paper.

Possible Issues

See the troubleshooting section of nerv for potential issues.

Please open an issue if you encounter any errors running the code.

Citation

Please cite our paper if you find it useful in your research:

@article{wu2023slotdiffusion,
  title={SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models},
  author={Wu, Ziyi and Hu, Jingyu and Lu, Wuyue and Gilitschenski, Igor and Garg, Animesh},
  journal={NeurIPS},
  year={2023}
}

Acknowledgement

We thank the authors of Slot Attention, slot_attention.pytorch, SAVi, SLATE, STEVE, Latent Diffusion Models, DPM-Solver, DINOSAUR, MaskContrast and SlotFormer for opening source their wonderful works.

License

SlotDiffusion is released under the MIT License. See the LICENSE file for more details.

Contact

If you have any questions about the code, please contact Ziyi Wu dazitu616@gmail.com

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

flutter-tutor

Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

16.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary