PlaySlot

Official implementation of: "PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning" by Villar-Corrales & Behnke. ICML 2025

Generate Convert Improve

Install / Use

/learn @angelvillar96/PlaySlot

About this skill

Quality Score

0/100

README

PlaySlot: Controllable Object-Centric Video Prediction

Official implementation of: PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning by Angel Villar-Corrales and Sven Behnke. ICML. 2025.

[Paper] [Project Page] [BibTeX] [Live Demo]

<table> <tr> <td rowspan="2" align="center"> <b>Main Figure</b> <img src="assets/teaser.png" width="200%"><br> </td> <td align="center"> <b>Target</b> <img src="assets/top_readme_examples/gif1/gt_GIF_frames.gif" width="100%" /> </td> <td align="center"> <b>Preds.</b> <img src="assets/top_readme_examples/gif1/pred_GIF_frames.gif" width="100%" /> </td> <td align="center"> <b>Segm.</b> <img src="assets/top_readme_examples/gif1/masks_GIF_masks.gif" width="100%" /> </td> <td align="center"> <b>Obj.1</b> <img src="assets/top_readme_examples/gif1/gt_obj_8.gif" width="100%" /> </td> <td align="center"> <b>Obj.2</b> <img src="assets/top_readme_examples/gif1/gt_obj_7.gif" width="100%" /> </td> </tr> <tr> <td align="center"> <img src="assets/top_readme_examples/gif2/gt_GIF_frames.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/pred_GIF_frames.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/masks_GIF_masks.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/gt_obj_5.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/gt_obj_6.gif" width="100%" /> </td> </tr> </table>

Installation and Dataset Preparation

Clone the repository and install all required packages including in our conda environment, as well as other external dependencies, such as the multi-object-fetch environment or MetaWorld.

git clone git@github.com:angelvillar96/PlaySlot.git
cd PlaySlot
./create_conda_env.sh
source ~/.bashrc
conda activate PlaySlot

Download and extract the pretrained models, including checkpoints for the SAVi decomposition, predictor modules and behavior modules:

chmod +x download_pretrained.sh
./download_pretrained.sh

Download the datasets:

ButtonPress & BlockPush: You can automatically download and place the ButtonPress and BlockPush datasets by running the following commands:

chmod +x download_datasets.sh
./download_datasets.sh

Sketchy: For downloading the Sketchy robot dataset, we refer to the original source

Training

We refer to docs/TRAIN.md for detailed instructions for training your own PlaySlot. We include instractions for all training stages, including training SAVi, jointly training cOCVP and InvDyn, and learning behaviors from unlabelled expert demonstrations.

Evaluation and Figure Generation

We provide bash scripts for evaluating and generating figures using our pretrained checkpoints. <br> Simply run the bash scripts by:

./scripts/SCRIPT_NAME

Example:

./scripts/05_eval_PlaySlot_BlockPush.sh
./scripts/06_generate_figs_pred_BlockPush.sh
./scripts/06_generate_action_figs_BlockPush.sh

Below we discuss more in detail the different evaluation and figure generation scripts and processes.

Evaluate SAVi for Image Decomposition

You can quantitatively and qualitatively evaluate a SAVi video decomposition model using the src/03_evaluate_savi.py and src/06_generate_figs_savi.py scripts, respectively.

This scrips will evaluate the model on the test set and generate figures for the results.

Example:

python src/03_evaluate_savi.py \
  -d experiments/BlockPush/ \
  --savi_ckpt SAVi_BlockPush.pth \
  --results_name quant_eval_savi

python src/06_generate_figs_savi.py \
  -d experiments/BlockPush/ \
  --savi_ckpt SAVi_ButtonPress.pth \
  --num_seqs 10 \
  --num_frames 8

<details> <summary><i>Show SAVi Figures</i></summary> Generating figures with SAVi should produce figures as follows: <img src="assets/savi_imgs/savi_slots_00.png" width="49%" align="center"> <img src="assets/savi_imgs/savi_slots_01.png" width="49%" align="center"> </details>

Evaluate PlaySlot for Video Prediction

You can evaluate PlaySlot for video prediction using the src/05_evaluate_PlaySlot.py script. This script takes a pretrained SAVi and PlaySlot checkpoints and evaluates the visual quality of the predicted frames.

Example:

python src/05_evaluate_PlaySlot.py \
  -d experiments/BlockPush/ \
  --name_pred_exp PlaySlot \
  --savi_ckpt SAVi_BlockPush.pth \
  --pred_ckpt PlaySlot_BlockPush.pth \
  --results_name quant_eval_playslot \
  --post_only \
  --num_seed 6 \
  --num_preds 15 \
  --set_expert_policy

Generate Figures and Animations

We provide two scripts to generate video prediction, object prediction, and segmentation figures and animations.

src/06_generate_figs_pred.py generates images and animations of frames, objects and slot masks predicted by PlaySlot conditioned on latent actions inferred by the Inverse Dynamics model.

Example:

python src/06_generate_figs_pred.py \
  -d experiments/BlockPush/ \
  --name_pred_exp PlaySlot \
  --savi_ckpt SAVi_BlockPush.pth \
  --pred_ckpt PlaySlot_BlockPush.pth \
  --num_seqs 10 \
  --num_seed 1 \
  --num_preds 15 \
  --set_expert_policy

<details> <summary><i>Show Example Outputs of `src/06_generate_figs_pred.py`</i></summary> Generating figures with PlaySlot should produce animations as follows: <br> </table> <tbody> <tr> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/gt_GIF_frames.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/pred_GIF_frames.gif" width="11%"/> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/masks_GIF_masks.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_1.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_2.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_3.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_5.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_7.gif" width="11%" /> </td> </tr> <br> <tr> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/gt_GIF_frames.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/pred_GIF_frames.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/masks_GIF_masks.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_1.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_2.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_3.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_6.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_7.gif" width="11%" /> </td> </tr> </tbody> </table> </details>

src/06_generate_action_figs.py generates images and animations of frames generated by PlaySlot by repeatedly conditioning the predition process on a single learned action prototype.

Example:

python src/06_generate_action_figs.py \
  -d experiments/BlockPush/ \
  --name_pred_exp PlaySlot \
  --savi_ckpt SAVi_BlockPush.pth \
  --pred_ckpt PlaySlot_BlockPush.pth \
  --num_seqs 10 \
  --num_seed 1 \
  --num_preds 15 \
  --set_expert_policy

<details> <summary><i>Show Example Outputs of `src/06_generate_action_figs.py`</i></summary> Generating figures with this script should produce animations as follows: <br> </table> <tr> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/inferred_dynamics.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_1.gif" width="11%"/> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_2.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_3.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_4.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_5.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_6.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_7.gif" width="11%" /> </td>

Related Skills

qqbot-channel

348.5k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.3k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

348.5k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

angelvillar96

View profile

View on GitHub

GitHub Stars19

CategoryContent

Updated3d ago

Forks2

angelvillar96/PlaySlot

Languages

Python

Security Score

80/100

Audited on Apr 1, 2026

No findings