PlaySlot
Official implementation of: "PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning" by Villar-Corrales & Behnke. ICML 2025
Install / Use
/learn @angelvillar96/PlaySlotREADME
PlaySlot: Controllable Object-Centric Video Prediction
Official implementation of: PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning by Angel Villar-Corrales and Sven Behnke. ICML. 2025.
[Paper]
[Project Page]
[BibTeX]
[Live Demo]
Installation and Dataset Preparation
- Clone the repository and install all required packages including in our
condaenvironment, as well as other external dependencies, such as the multi-object-fetch environment or MetaWorld.
git clone git@github.com:angelvillar96/PlaySlot.git
cd PlaySlot
./create_conda_env.sh
source ~/.bashrc
conda activate PlaySlot
- Download and extract the pretrained models, including checkpoints for the SAVi decomposition, predictor modules and behavior modules:
chmod +x download_pretrained.sh
./download_pretrained.sh
- Download the datasets:
- ButtonPress & BlockPush: You can automatically download and place the ButtonPress and BlockPush datasets by running the following commands:
chmod +x download_datasets.sh
./download_datasets.sh
- Sketchy: For downloading the Sketchy robot dataset, we refer to the original source
Training
We refer to docs/TRAIN.md for detailed instructions for training your own PlaySlot. We include instractions for all training stages, including training SAVi, jointly training cOCVP and InvDyn, and learning behaviors from unlabelled expert demonstrations.
Evaluation and Figure Generation
We provide bash scripts for evaluating and generating figures using our pretrained checkpoints. <br> Simply run the bash scripts by:
./scripts/SCRIPT_NAME
Example:
./scripts/05_eval_PlaySlot_BlockPush.sh
./scripts/06_generate_figs_pred_BlockPush.sh
./scripts/06_generate_action_figs_BlockPush.sh
Below we discuss more in detail the different evaluation and figure generation scripts and processes.
Evaluate SAVi for Image Decomposition
You can quantitatively and qualitatively evaluate a SAVi video decomposition model using the src/03_evaluate_savi.py and src/06_generate_figs_savi.py scripts, respectively.
This scrips will evaluate the model on the test set and generate figures for the results.
Example:
python src/03_evaluate_savi.py \
-d experiments/BlockPush/ \
--savi_ckpt SAVi_BlockPush.pth \
--results_name quant_eval_savi
python src/06_generate_figs_savi.py \
-d experiments/BlockPush/ \
--savi_ckpt SAVi_ButtonPress.pth \
--num_seqs 10 \
--num_frames 8
<details>
<summary><i>Show SAVi Figures</i></summary>
Generating figures with SAVi should produce figures as follows:
<img src="assets/savi_imgs/savi_slots_00.png" width="49%" align="center">
<img src="assets/savi_imgs/savi_slots_01.png" width="49%" align="center">
</details>
Evaluate PlaySlot for Video Prediction
You can evaluate PlaySlot for video prediction using the src/05_evaluate_PlaySlot.py script.
This script takes a pretrained SAVi and PlaySlot checkpoints and evaluates the visual quality of the predicted frames.
Example:
python src/05_evaluate_PlaySlot.py \
-d experiments/BlockPush/ \
--name_pred_exp PlaySlot \
--savi_ckpt SAVi_BlockPush.pth \
--pred_ckpt PlaySlot_BlockPush.pth \
--results_name quant_eval_playslot \
--post_only \
--num_seed 6 \
--num_preds 15 \
--set_expert_policy
Generate Figures and Animations
We provide two scripts to generate video prediction, object prediction, and segmentation figures and animations.
src/06_generate_figs_pred.pygenerates images and animations of frames, objects and slot masks predicted by PlaySlot conditioned on latent actions inferred by the Inverse Dynamics model.
Example:
python src/06_generate_figs_pred.py \
-d experiments/BlockPush/ \
--name_pred_exp PlaySlot \
--savi_ckpt SAVi_BlockPush.pth \
--pred_ckpt PlaySlot_BlockPush.pth \
--num_seqs 10 \
--num_seed 1 \
--num_preds 15 \
--set_expert_policy
<details>
<summary><i>Show Example Outputs of `src/06_generate_figs_pred.py`</i></summary>
Generating figures with PlaySlot should produce animations as follows:
<br>
</table>
<tbody>
<tr>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/gt_GIF_frames.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/pred_GIF_frames.gif" width="11%"/>
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/masks_GIF_masks.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/obj_1.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/obj_2.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/obj_3.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/obj_5.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif1/obj_7.gif" width="11%" />
</td>
</tr>
<br>
<tr>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/gt_GIF_frames.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/pred_GIF_frames.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/masks_GIF_masks.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/obj_1.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/obj_2.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/obj_3.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/obj_6.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Pred_GIFs/gif2/obj_7.gif" width="11%" />
</td>
</tr>
</tbody>
</table>
</details>
src/06_generate_action_figs.pygenerates images and animations of frames generated by PlaySlot by repeatedly conditioning the predition process on a single learned action prototype.
Example:
python src/06_generate_action_figs.py \
-d experiments/BlockPush/ \
--name_pred_exp PlaySlot \
--savi_ckpt SAVi_BlockPush.pth \
--pred_ckpt PlaySlot_BlockPush.pth \
--num_seqs 10 \
--num_seed 1 \
--num_preds 15 \
--set_expert_policy
<details>
<summary><i>Show Example Outputs of `src/06_generate_action_figs.py`</i></summary>
Generating figures with this script should produce animations as follows:
<br>
</table>
<tr>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/inferred_dynamics.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_1.gif" width="11%"/>
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_2.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_3.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_4.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_5.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_6.gif" width="11%" />
</td>
<td align="center">
<img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_7.gif" width="11%" />
</td>Related Skills
qqbot-channel
348.5kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
348.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
