ECRL
Official PyTorch implementation of "Entity-Centric Reinforcement Learning for Object Manipulation from Pixels", Haramati et al., ICLR 2024
Install / Use
/learn @DanHrmti/ECRLREADME
Entity-Centric Reinforcement Learning (ECRL)
Official PyTorch code release of the paper "Entity-Centric Reinforcement Learning for Object Manipulation from Pixels" by Dan Haramati, Tal Daniel and Aviv Tamar.
Offline RL Variant Release (Feb 2026)
An offline variant of ECRL implemented with IQL can be found in our follow-up work on Hierarchical ECRL (HECRL).
<h1 align="center"> <br> Entity-Centric Reinforcement Learning for Object Manipulation from Pixels <br> </h1> <h3 align="center"> <a href="https://danhrmti.github.io/">Dan Haramati</a> • <a href="https://taldatech.github.io/">Tal Daniel</a> • <a href="https://avivt.github.io/avivt/">Aviv Tamar</a> </h3> <h3 align="center">ICLR 2024 - Spotlight (top 5%)</h3> <h4 align="center">Goal-Conditioned Reinforcement Learning Workshop, NeurIPS 2023 - Spotlight</h4> <h4 align="center"> <a href="https://sites.google.com/view/entity-centric-rl/">Project Website</a> • <a href="https://arxiv.org/abs/2404.01220">arXiv</a> • <a href="https://openreview.net/forum?id=uDxeSZ1wdI">OpenReview</a> </h4> <h6 align="center">Zero-Shot Generalization from 3 to 12 Objects</h6> <p align="center"> <img src="media/sort_push_video.gif" height="140"> <img src="media/sort_push_train_goal.png" height="140"> <img src="media/sort_push_train_video.gif" height="140"> </p>
Abstract
Manipulating objects is a hallmark of human intelligence, and an important task in domains such as robotics. In principle, Reinforcement Learning (RL) offers a general approach to learn object manipulation. In practice, however, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality, especially when learning from raw image observations. In this work we propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction, and use it to learn goal-conditioned manipulation of several objects. Key to our method is the ability to handle goals with dependencies between the objects (e.g., moving objects in a certain order). We further relate our architecture to the generalization capability of the trained agent, based on a theoretical result for compositional generalization, and demonstrate agents that learn with 3 objects but generalize to similar tasks with over 10 objects.
Citation
Haramati, Dan, Tal Daniel, and Aviv Tamar. "Entity-Centric Reinforcement Learning for Object Manipulation from Pixels." Proceedings of the Twelfth International Conference on Learning Representations (ICLR). 2024.
@inproceedings{
haramati2024entitycentric,
title={Entity-Centric Reinforcement Learning for Object Manipulation from Pixels},
author={Dan Haramati and Tal Daniel and Aviv Tamar},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=uDxeSZ1wdI}
}
<h6 align="center">In the Eyes of the Agent</h6>
<p align="center">
<img src="media/3c_dlp_vis_goal_front.png" height="140">
<img src="media/3c_dlp_vis_video_front.gif" height="170">
<img src="media/3c_dlp_vis_video_side.gif" height="170">
<img src="media/3c_dlp_vis_goal_side.png" height="140">
</p>
Content
Entity Centric-Reinforcement Learning (ECRL)
1. Prerequisites
The following are the main libraries required to run this code:
| Library | Version |
|---------------------|---------|
| Python | 3.8 |
| torch | 2.1.2 |
| stable-baselines3 | 1.5.0 |
| isaacgym | |
For the full list of requirements, see the requirements.txt file.
For the simulation environment, download the Isaac Gym Preview release from the website, then follow the installation instructions in the documentation.
2. Environments
<h6 align="center">Environments</h6> <p align="center"> <img src="media/environments.png" height="140"> </p>The above figure describes the suite of training environments used in the paper.
N-Cubes: Push N different-colored cubes to their goal location.
Adjacent-Goals: A 3-Cubes setting where goals are sampled randomly on the table such that all cubes are adjacent.
This task requires accounting for interactions between objects.
Ordered-Push: A 2-Cubes setting where a narrow corridor is set on top of the table such that its width can only fit a single cube.
We consider two possible goal configurations: red cube in the rear of the corridor and green cube in the front, or vice versa.
This task requires to fulfill the goals in a certain order, otherwise the agent fails (pulling a block out of the corridor is not possible).
Small-Table: A 3-Cubes setting where the table is substantially smaller.
This task requires to accurately account for all objects in the scene at all times, to avoid pushing blocks off the table.
Push-2T: Push 2 T-shaped blocks to a single goal orientation.
A configuration file for each environment, IsaacPandaPushConfig.yaml, is provided in the corresponding directory in config.
3. Training
Deep Latent Particles (DLP) Pretraining
We provide pretrained model checkpoints:
| Model | Dataset | Download |
|----------------|--------------------------|-----------------------------------------------------------------------------------|
| DLP | 5-Cubes | Google Drive |
| DLP | 6-Cubes | Google Drive |
| DLP | Push-T | Google Drive |
| Slot-Attention | 5-Cubes | Google Drive |
| VAE | Mixture of 1/2/3-Cubes | Google Drive |
Download and place in the relevant directory in latent_rep_chkpts
(e.g., checkpoint of DLP trained on data from the 5-Cubes environment should be placed in latent_rep_chkpts/dlp_push_5C).
In order to retrain the model:
- Collect image data using a random policy by running
main.py -c <configuration_dir>with the desired environment (e.g,main.py -c config/n_cubes), settingcollectData: TrueandcollectDataNumTimestepsin the relevantConfig.yaml. This will save a.npyfile in theresultsdirectory. - Process the data into a dataset by running
dlp2/datasets/process_dlp_data.py(fill in the relevant paths in the beginning of the script). - Configure
config/TrainDLPConfig.yamland runtrain_dlp.py.
RL Training
Run main.py -c <configuration_dir> with the desired configuration (e.g, main.py -c config/n_cubes).
Config.yaml contains agent and training parameters and IsaacPandaPushConfig.yaml contains environment parameters.
In order to reproduce the experiments in the paper, input the corresponding configuration directory.
The configurations are already set to match the ones used in the paper.
The parameters requiring configuration for the different instances of the experiments (e.g, 'State' or 'Image'):
- In
Config.yamltheModelparameters. - In
IsaacPandaPushConfig.yamlthenumObjectsparameter (for n_cubes and push_t).
To log training statistics and images/videos using Weights & Biases set WANDB: log: True in Config.yaml
and fill in your username in the wandb.init(entity="") line in the main.py script.
Agent model checkpoints and intermediate results are saved in the model_chkpts and results directories respectively.
4. Evaluation
To evaluate an agent on a given environment, run policy_eval.py.
Set the agent model_path and the desired configuration directory manually in the beginning of the script.
Evaluation on Zero-Shot Generalization
Cube Sorting: train on config/n_cubes with numObjects: 3 and evaluate on config/generalization_sort_push.
Different Number of Cubes than in Training: train on config/generalization_num_cubes with numObjects: 3 and evaluate with same config and varying number of objects.
5. Repository Content Details
| Filename | Description |
|------------------------------|-------------------------------------------------------------------------|
| main.py | main script for training the RL agent |
| policy_eval.py | script for evaluating a trained agent
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
