SkillAgentSearch skills...

PlaySlot

Official implementation of: "PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning" by Villar-Corrales & Behnke. ICML 2025

Install / Use

/learn @angelvillar96/PlaySlot
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PlaySlot: Controllable Object-Centric Video Prediction

Official implementation of: PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning by Angel Villar-Corrales and Sven Behnke. ICML. 2025.

[Paper]    [Project Page]    [BibTeX]    [Live Demo]

<table> <tr> <td rowspan="2" align="center"> <b>Main Figure</b> <img src="assets/teaser.png" width="200%"><br> </td> <td align="center"> <b>Target</b> <img src="assets/top_readme_examples/gif1/gt_GIF_frames.gif" width="100%" /> </td> <td align="center"> <b>Preds.</b> <img src="assets/top_readme_examples/gif1/pred_GIF_frames.gif" width="100%" /> </td> <td align="center"> <b>Segm.</b> <img src="assets/top_readme_examples/gif1/masks_GIF_masks.gif" width="100%" /> </td> <td align="center"> <b>Obj.1</b> <img src="assets/top_readme_examples/gif1/gt_obj_8.gif" width="100%" /> </td> <td align="center"> <b>Obj.2</b> <img src="assets/top_readme_examples/gif1/gt_obj_7.gif" width="100%" /> </td> </tr> <tr> <td align="center"> <img src="assets/top_readme_examples/gif2/gt_GIF_frames.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/pred_GIF_frames.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/masks_GIF_masks.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/gt_obj_5.gif" width="100%" /> </td> <td align="center"> <img src="assets/top_readme_examples/gif2/gt_obj_6.gif" width="100%" /> </td> </tr> </table>

Installation and Dataset Preparation

  1. Clone the repository and install all required packages including in our conda environment, as well as other external dependencies, such as the multi-object-fetch environment or MetaWorld.
git clone git@github.com:angelvillar96/PlaySlot.git
cd PlaySlot
./create_conda_env.sh
source ~/.bashrc
conda activate PlaySlot
  1. Download and extract the pretrained models, including checkpoints for the SAVi decomposition, predictor modules and behavior modules:
chmod +x download_pretrained.sh
./download_pretrained.sh
  1. Download the datasets:
  • ButtonPress & BlockPush: You can automatically download and place the ButtonPress and BlockPush datasets by running the following commands:
chmod +x download_datasets.sh
./download_datasets.sh
  • Sketchy: For downloading the Sketchy robot dataset, we refer to the original source

Training

We refer to docs/TRAIN.md for detailed instructions for training your own PlaySlot. We include instractions for all training stages, including training SAVi, jointly training cOCVP and InvDyn, and learning behaviors from unlabelled expert demonstrations.

Evaluation and Figure Generation

We provide bash scripts for evaluating and generating figures using our pretrained checkpoints. <br> Simply run the bash scripts by:

./scripts/SCRIPT_NAME

Example:

./scripts/05_eval_PlaySlot_BlockPush.sh
./scripts/06_generate_figs_pred_BlockPush.sh
./scripts/06_generate_action_figs_BlockPush.sh

Below we discuss more in detail the different evaluation and figure generation scripts and processes.

Evaluate SAVi for Image Decomposition

You can quantitatively and qualitatively evaluate a SAVi video decomposition model using the src/03_evaluate_savi.py and src/06_generate_figs_savi.py scripts, respectively.

This scrips will evaluate the model on the test set and generate figures for the results.

Example:

python src/03_evaluate_savi.py \
  -d experiments/BlockPush/ \
  --savi_ckpt SAVi_BlockPush.pth \
  --results_name quant_eval_savi

python src/06_generate_figs_savi.py \
  -d experiments/BlockPush/ \
  --savi_ckpt SAVi_ButtonPress.pth \
  --num_seqs 10 \
  --num_frames 8
<details> <summary><i>Show SAVi Figures</i></summary> Generating figures with SAVi should produce figures as follows: <img src="assets/savi_imgs/savi_slots_00.png" width="49%" align="center"> <img src="assets/savi_imgs/savi_slots_01.png" width="49%" align="center"> </details>

Evaluate PlaySlot for Video Prediction

You can evaluate PlaySlot for video prediction using the src/05_evaluate_PlaySlot.py script. This script takes a pretrained SAVi and PlaySlot checkpoints and evaluates the visual quality of the predicted frames.

Example:

python src/05_evaluate_PlaySlot.py \
  -d experiments/BlockPush/ \
  --name_pred_exp PlaySlot \
  --savi_ckpt SAVi_BlockPush.pth \
  --pred_ckpt PlaySlot_BlockPush.pth \
  --results_name quant_eval_playslot \
  --post_only \
  --num_seed 6 \
  --num_preds 15 \
  --set_expert_policy

Generate Figures and Animations

We provide two scripts to generate video prediction, object prediction, and segmentation figures and animations.

  1. src/06_generate_figs_pred.py generates images and animations of frames, objects and slot masks predicted by PlaySlot conditioned on latent actions inferred by the Inverse Dynamics model.

Example:

python src/06_generate_figs_pred.py \
  -d experiments/BlockPush/ \
  --name_pred_exp PlaySlot \
  --savi_ckpt SAVi_BlockPush.pth \
  --pred_ckpt PlaySlot_BlockPush.pth \
  --num_seqs 10 \
  --num_seed 1 \
  --num_preds 15 \
  --set_expert_policy
<details> <summary><i>Show Example Outputs of `src/06_generate_figs_pred.py`</i></summary> Generating figures with PlaySlot should produce animations as follows: <br> </table> <tbody> <tr> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/gt_GIF_frames.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/pred_GIF_frames.gif" width="11%"/> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/masks_GIF_masks.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_1.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_2.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_3.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_5.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif1/obj_7.gif" width="11%" /> </td> </tr> <br> <tr> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/gt_GIF_frames.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/pred_GIF_frames.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/masks_GIF_masks.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_1.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_2.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_3.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_6.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Pred_GIFs/gif2/obj_7.gif" width="11%" /> </td> </tr> </tbody> </table> </details>
  1. src/06_generate_action_figs.py generates images and animations of frames generated by PlaySlot by repeatedly conditioning the predition process on a single learned action prototype.

Example:

python src/06_generate_action_figs.py \
  -d experiments/BlockPush/ \
  --name_pred_exp PlaySlot \
  --savi_ckpt SAVi_BlockPush.pth \
  --pred_ckpt PlaySlot_BlockPush.pth \
  --num_seqs 10 \
  --num_seed 1 \
  --num_preds 15 \
  --set_expert_policy
<details> <summary><i>Show Example Outputs of `src/06_generate_action_figs.py`</i></summary> Generating figures with this script should produce animations as follows: <br> </table> <tr> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/inferred_dynamics.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_1.gif" width="11%"/> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_2.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_3.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_4.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_5.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_6.gif" width="11%" /> </td> <td align="center"> <img src="assets/PlaySlot_Action_GIFs/gif1/action_proto_7.gif" width="11%" /> </td>

Related Skills

View on GitHub
GitHub Stars19
CategoryContent
Updated3d ago
Forks2

Languages

Python

Security Score

80/100

Audited on Apr 1, 2026

No findings