SlotFormer
Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
Install / Use
/learn @pairlab/SlotFormerREADME
SlotFormer
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models<br/> Ziyi Wu, Nikita Dvornik, Klaus Greff, Thomas Kipf, Animesh Garg<br/> ICLR'23 | GitHub | arXiv | Project page
Ground-Truth Our Prediction | Ground-Truth Our Prediction
:--------------------------------------------------:|:--------------------------------------------------:
| 
Introduction
This is the official PyTorch implementation for paper: SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models, which is accepted by ICLR 2023. The code contains:
- Training base object-centric slot models
- Video prediction task on OBJ3D and CLEVRER
- VQA task on CLEVRER
- VQA task on Physion
- Planning task on PHYRE
Update
- 2023.9.20: BC-breaking change! We fix an error in the mIoU calculation code. This won't change the order of benchmarked methods, but will change their absolute values. See this PR for more details. Please re-run the evaluation code on your trained models to get the correct results. The updated mIoU of SlotFormer on CLEVRER is 49.42 (using the provided pre-trained weight)
- 2023.1.20: The paper is accepted by ICLR 2023!
- 2022.10.26: Support Physion VQA task and PHYRE planning task
- 2022.10.16: Initial code release!
- Support base object-centric model training
- Support SlotFormer training
- Support evaluation on the video prediction task
- Support evaluation on the CLEVRER VQA task
Installation
Please refer to install.md for step-by-step guidance on how to install the packages.
Experiments
This codebase is tailored to Slurm GPU clusters with preemption mechanism. For the configs, we mainly use RTX6000 with 24GB memory (though many experiments don't require so much memory). Please modify the code accordingly if you are using other hardware settings:
- Please go through
scripts/train.pyand change the fields marked byTODO: - Please read the config file for the model you want to train. We use DDP with multiple GPUs to accelerate training. You can use less GPUs to achieve a better memory-speed trade-off
Dataset Preparation
Please refer to data.md for steps to download and pre-process each dataset.
Reproduce Results
Please see benchmark.md for detailed instructions on how to reproduce our results in the paper.
Citation
Please cite our paper if you find it useful in your research:
@article{wu2022slotformer,
title={SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models},
author={Wu, Ziyi and Dvornik, Nikita and Greff, Klaus and Kipf, Thomas and Garg, Animesh},
journal={arXiv preprint arXiv:2210.05861},
year={2022}
}
Acknowledgement
We thank the authors of Slot-Attention, slot_attention.pytorch, SAVi, RPIN and Aloe for opening source their wonderful works.
License
SlotFormer is released under the MIT License. See the LICENSE file for more details.
Contact
If you have any questions about the code, please contact Ziyi Wu dazitu616@gmail.com
Related Skills
qqbot-channel
343.3kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
project-overview
FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A
