Latco

Model-Based Reinforcement Learning via Latent-Space Collocation.

Generate Convert Improve

Install / Use

/learn @zchuning/Latco

About this skill

Quality Score

0/100

README

Model-Based Reinforcement Learning via Latent-Space Collocation

[Project Website] [Sparse MetaWorld Code] [Talk (5min)] [Paper]

Oleh Rybkin*1, Chuning Zhu*1, Anusha Nagabandi2, Kostas Daniilidis1, Igor Mordatch3, Sergey Levine4 (* equal contribution)

1University of Pennsylvania 2Covariant 3Google Brain 4UC Berkeley

This is a TF2 implementation for our latent-space collocation (LatCo) agent for model-based reinforcement learning. LatCo is a visual model-based reinforcement learning method that can solve long-horizon tasks by optimizing sequences of latent states, instead of optimizing actions directly as done in shooting methods such as visual foresight and planet. Optimizing latent states allows LatCo to quickly discover the high-reward region and construct effective plans even for complex multi-stage tasks that shooting methods do not solve.

Instructions

Setting up repo

git clone https://github.com/zchuning/latco.git
cd latco
git submodule update --init --recursive

Dependencies

Install Mujoco
Install dependencies

pip install --user numpy==1.19.2 cloudpickle==1.2.2 tensorflow-gpu==2.2.0 tensorflow_probability==0.10.0
pip install --user gym imageio pandas pyyaml matplotlib
pip install --user mujoco-py (optional - to run Sparse MetaWorld or Pointmass)
pip install --user dm-control (optional - to run DM control)
pip install --user -e metaworldv2

The code is tested on Ubuntu 20.04, with CUDNN 7.6.5, CUDA 10.1, and Python 3.8

Commands

Train LatCo agent on the Reaching task:

python train.py --collect_sparse_reward True --use_sparse_reward True --task mw_SawyerReachEnvV2 --planning_horizon 30 --mpc_steps 30 --agent latco --logdir logdir/mw_reach/latco/0

Evaluate LatCo agent:

python eval.py --collect_sparse_reward True --use_sparse_reward True --task mw_SawyerReachEnvV2 --planning_horizon 30 --mpc_steps 30 --agent latco --logdir logdir/mw_reach/latco/0 --logdir_eval logdir_eval/mw_reach/latco/0 --n_eval_episodes 10

To run the offline+fine-tune experiments, download the released offline data for the Hammer and the Thermos tasks. Train LatCo agent on the Thermos task with offline data (data path specified by --offline_dir):

python train.py --prefill 0 --action_repeat 2 --collect_sparse_reward True --use_sparse_reward True --task mw_SawyerStickPushEnvV2 --planning_horizon 50 --mpc_steps 25 --agent latco --logdir logdir/mw_thermos/latco/0 --offline_dir logdir/mw_thermos/offline/episodes

For convenience, we include a script for automatically generating training and evaluation commands with hyperparameters from the paper. To generate training command, run:

python gencmd.py --task mw_reach --method latco

To generate evaluation command, run:

python gencmd.py --task mw_reach --method latco --eval True

--task can be one of mw_reach, mw_button, mw_window, mw_drawer, mw_push, mw_hammer, mw_thermos, dmc_reacher, dmc_cheetah, dmc_quadruped. --method can be one of latco, planet, mppi, shooting_gd, shooting_gn, platco. To replicate ablation experiments, set --method to be one of latco_no_constraint, latco_no_relax, latco_first_order, image_colloc. To run dense reward metaworld tasks, add --dense_mw True.

Generate plots:

python plot.py --indir ./logdir --outdir ./plots --xaxis step --yaxis train/success --bins 3e4

Tensorboard:

tensorboard --logdir ./logdir

Troubleshooting

By default, the mujoco rendering for Sparse MetaWorld will use glfw. With egl rendering, gpu 0 will be used by default. You can change these with the following environment variables

export MUJOCO_RENDERER='egl'
export GL_DEVICE_ID=1

If you get ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject, chances are, your mujoco-py was not installed properly. You can fix it with the following

pip uninstall mujoco-py
pip install mujoco-py --no-cache-dir --no-binary :all: --no-build-isolation

Code structure

latco.py contains the LatCo agent. It inherits planning_agent.
planners/probabilistic_latco.py contains the Gaussian LatCo agent. It inherits planning_agent.
planners/gn_solver.py is a Gauss-Newton optimizer which leverages the block-tridiagonal structure of the Jacobian to speed up computation.
base_agent.py is a barebone agent with an RSSM model, modified from the Dreamer agent. planning_agent.py inherits base_agent and is inherited by all methods in planners.
planners contains planning agents such as shooting cem, shooting gd, and a few other variants.
envs/sparse_metaworld.py is the wrapper for the Sparse MetaWorld benchmark. The benchmark itself is a submodule metaworldv2.
envs/pointmass/pointmass_prob_env.py is the pointmass lottery task.

Using Sparse MetaWorld environments

WARNING! The Sparse MetaWorld environments by default output dense reward. The sparse reward is in info['success']. A simple-to-use standalone environment code that outputs sparse reward is here, please see usage instructions by that link.

Adding new environments

See wrappers.py as well as envs/sparse_metaworld.py, envs/pointmass/pointmass_prob_env.py for examples on how to add new environments. We follow a simple gym-like interface from the Dreamer repo.

Note you can also train our agent entirely offline on an existing dataset of episodes. To do this, use the --offline_dir argument to point to the dataset and set --pretrain to the desired number of training steps.

Bibtex

If you find this code useful, please cite:

@inproceedings{rybkin2021latco,
  title={Model-Based Reinforcement Learning via Latent-Space Collocation},
  author={Rybkin, Oleh and Zhu, Chuning and Nagabandi, Anusha and Daniilidis, Kostas and Mordatch, Igor and Levine, Sergey},
  journal={Proceedings of the 38th International Conference on Machine Learning},
  year={2021}
}

Acknowledgements

This codebase was built on top of Dreamer.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

research_rules

Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

zchuning

View profile

View on GitHub

GitHub Stars34

CategoryEducation

Updated5mo ago

Forks1

zchuning/latco

Languages

Python

Security Score

77/100

Audited on Oct 11, 2025

No findings