Ibc
Official implementation of Implicit Behavioral Cloning, as described in our CoRL 2021 paper, see more at https://implicitbc.github.io/
Install / Use
/learn @google-research/IbcREADME
Implicit Behavioral Cloning
This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper:
Implicit Behavioral Cloning (website link) (arXiv link) </br> Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson </br> Conference on Robot Learning (CoRL) 2021
|
:-------------------------:|:-------------------------:|
Abstract
We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.
Prerequisites
The code for this project uses python 3.7+ and the following pip packages:
python3 -m pip install --upgrade pip
pip install \
absl-py==0.12.0 \
gin-config==0.4.0 \
matplotlib==3.4.3 \
mediapy==1.0.3 \
opencv-python==4.5.3.56 \
pybullet==3.1.6 \
scipy==1.7.1 \
tensorflow==2.6.0 \
keras==2.6.0 \
tf-agents==0.11.0rc0 \
tqdm==4.62.2
(Optional): For Mujoco support, see docs/mujoco_setup.md. Recommended to skip it
unless you specifically want to run the Adroit and Kitchen environments.
Quickstart: from 0 to a trained IBC policy in 10 minutes.
Step 1: Install listed Python packages above in Prerequisites.
Step 2: Run unit tests (should take less than a minute), and do this from the directory just above the top-level ibc directory:
./ibc/run_tests.sh
Step 3: Check that Tensorflow has GPU access:
python3 -c "import tensorflow as tf; print(tf.test.is_gpu_available())"
If the above prints False, see the following requirements, notably CUDA 11.2 and cuDNN 8.1.0: https://www.tensorflow.org/install/gpu#software_requirements.
Step 4: Let's do an example Block Pushing task, so first let's download oracle data (or see Tasks for how to generate it):
cd ibc/data
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip block_push_states_location.zip && rm block_push_states_location.zip
cd ../..
Step 5: Set PYTHONPATH to include the directory just above top-level ibc, so if you've been following the commands above it is:
export PYTHONPATH=$PYTHONPATH:${PWD}
Step 6: On that example Block Pushing task, we'll next do a training + evaluation with Implicit BC:
./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh
Some notes:
- On an example single-GPU machine (GTX 2080 Ti), the above trains at about 18 steps/sec, and should get to high success rates in 5,000 or 10,000 steps (roughly 5-10 minutes of training).
- The
mlp_ebm.ginis just one config, which is meant to be reasonably fast to train, with only 20 evals at each interval, and is not suitable for all tasks. See Tasks for more configs. - Due to the
--videoflag above, you can watch a video of the learned policy in action at:/tmp/ibc_logs/mlp_ebm/ibc_dfo/... navigate to thevideos/ttl=7dsubfolder, and by default there should be one example.mp4video saved every time you do an evaluation interval.
(Optional) Step 7: For the pybullet-based tasks, we also have real-time interactive visualization set up through a visualization server, so in one terminal:
cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 -m pybullet_utils.runServer
And in a different terminal run the oracle a few times with the --shared_memory flag:
cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 ibc/data/policy_eval.py -- \
--alsologtostderr \
--shared_memory \
--num_episodes=3 \
--policy=oracle_push \
--task=PUSH
You're done with Quickstart! See below for more Tasks, and also see docs/codebase_overview.md and docs/workflow.md for additional info.
Tasks
Task: Particle
In this task, the goal is for the agent (black dot) to first go to the green dot, then the blue dot.
Example IBC policy | Example MSE policy
:-------------------------:|:-------------------------:
|
|
Get Data
We can either generate data from scratch, for example for 2D (takes 15 seconds):
./ibc/ibc/configs/particle/collect_data.sh
Or just download all the data for all different dimensions: <a name="particle-data"></a>
cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/particle.zip
unzip particle.zip && rm particle.zip
cd ../..
Train and Evaluate
Let's start with some small networks, on just the 2D version since it's easiest to visualize, and compare MSE and IBC. Here's a small-network (256x2) IBC-with-Langevin config, where 2 is the argument for the environment dimensionality.
./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 2
And here's an idenitcally sized network (256x2) but with MSE config:
<!-- partial verified: 5% success, 10k steps, 20 episodes evaluated, 21.7 steps/sec -->./ibc/ibc/configs/particle/run_mlp_mse.sh 2
For the above configurations, we suggest comparing the rollout videos, which you can find at /tmp/ibc_logs/...corresponding_directory../videos/. At the top of this section is shown a comparison at 10,000 training steps for the two different above configs.
And here are the best configs respectfully for IBC (with langevin) and MSE, in this case run on the 16-dimensional environment: <a name="particle-train"></a>
./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 16
./ibc/ibc/configs/particle/run_mlp_mse_best.sh 16
Note: the _best config is kind of slow for Langevin to train, but even just ./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 16 (smaller network) seems to solve the 16-D environment pretty well, and is much faster to train.
Task: Block Pushing (from state observations)
Get Data
We can either generate data from scratch (~2 minutes for 2,000 episodes: 200 each across 10 replicas):
./ibc/ibc/configs/pushing_states/collect_data.sh
Or we can download data from the web:<a name="pushing-states-data"></a>
cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip 'block_push_states_location.zip' && rm block_push_states_location.zip
cd ../..
Train and Evaluate
Here's reasonably fast-to-train config for IBC with DFO:
<!-- partial verified: 100% in 10k steps, 18 steps/sec -->./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh
Or here's a config for IBC with Langevin:
<!-- partial verified: 95% in 5k steps, 6.5 steps/sec -->./ibc/ibc/configs/pushing_states/run_mlp_ebm_langevin.sh
Or here's a comparable, reasonably fast-to-train config for MSE:
<!-- partial verified: 85% in 10k steps, 18 steps/sec -->./ibc/ibc/configs/pushing_states/run_mlp_mse.sh
Or to run the best configs respectfully for IBC, MSE, and MDN (some of these might be slower to train than the above): <a name="pushing-states-train"></a>
<!-- partial verified: 100% at 15k steps, 18 steps/sec --> <!-- partial verified: 87% at 15k steps, 18 steps/sec --> <!-- partial verified: 75% at 5k steps, 18 steps/sec -->./ibc/ibc/configs/pushing_states/run_mlp_ebm_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mse_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mdn_best.sh
Task: Block Pushing (from image observations)
Get Data
Download data from the web: <a name="pushing-pixels-data"></a>
cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_visual_location.zip
unzip 'block_push_visual_location.zip' && rm block_push_visual_location.zip
cd ../..
Train and Evaluate
Here is an IBC with Langevin configuration which should actually converge faster than the IBC-with-DFO that we reported in the paper:
<!-- partial verified: 100% at 10k steps, 6.5 steps/sec, at 90x120 w/ 128 batch--> <!-- partial verified: 100% at 5k steps, 4.1 steps/sec, at 180x240 w/ 128 batch-->./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_langevin.sh
And here are the best configs respectfully for IBC (with DFO), MSE, and MDN: <a name="pushing-pixels-train"></a>
<!-- partial verified: 94% at 10k steps, 8.0 steps/sec, 180x240 w/ 128 batch--> <!-- partial verified: 68% at 10k steps, 9.0 steps/sec, 180x240 w/ 128 batch --Related Skills
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
codebase-to-course
Turn any codebase into a beautiful, interactive single-page HTML course that teaches how the code works to non-technical people. Use this skill whenever someone wants to create an interactive course, tutorial, or educational walkthrough from a codebase or project. Also trigger when users mention 'turn this into a course,' 'explain this codebase interactively,' 'teach this code,' 'interactive tutorial from code,' 'codebase walkthrough,' 'learn from this codebase,' or 'make a course from this project.' This skill produces a stunning, self-contained HTML file with scroll-based navigation, animated visualizations, embedded quizzes, and code-with-plain-English side-by-side translations.
academic-pptx
Use this skill whenever the user wants to create or improve a presentation for an academic context — conference papers, seminar talks, thesis defenses, grant briefings, lab meetings, invited lectures, or any presentation where the audience will evaluate reasoning and evidence. Triggers include: 'conference talk', 'seminar slides', 'thesis defense', 'research presentation', 'academic deck', 'academic presentation'. Also triggers when the user asks to 'make slides' in combination with academic content (e.g., 'make slides for my paper on X', 'create a presentation for my dissertation defense', 'build a deck for my grant proposal'). This skill governs CONTENT and STRUCTURE decisions. For the technical work of creating or editing the .pptx file itself, also read the pptx SKILL.md.
