Extract
Official repo for the 2024 CoRL Paper: EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data
Install / Use
/learn @jesbu1/ExtractREADME
EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data
[Project Website] [Paper]
Jesse Zhang<sup>1</sup>, Minho Heo<sup>2</sup>, Zuxin Liu<sup>3</sup>, Erdem Bıyık<sup>1</sup>, Joseph J. Lim<sup>2</sup>, Yao Liu<sup>4</sup>, Rasool Fakoor<sup>4</sup>
<sup>1</sup>University of Southern California <sup>2</sup>KAIST <sup>3</sup>Carnegie Mellon University <sup>4</sup>Amazon Web Services
<a href="https://clvrai.github.io/extract/"> <p align="center"> <img src="resources/extract_teaser.png" width="800"> </p> </img></a>This is the official PyTorch implementation of the paper "EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data" (CoRL 2024).
Requirements
- python 3.10+
- mujoco 2.0.2.5 (for RL experiments)
- Ubuntu
Installation Instructions
Use conda to install all required packages. Make sure you already have mujoco installed, follow instructions here or run:
# Mujoco installation for linux
mkdir ~/.mujoco
cd ~/.mujoco
wget https://www.roboti.us/download/mujoco200_linux.zip
unzip mujoco200_linux.zip
mv mujoco200_linux mujoco200
wget https://www.roboti.us/file/mjkey.txt
Now, to finalize mujoco install, add the following lines to your ~/.bashrc:
export MJLIB_PATH=$HOME/.mujoco/mujoco200/bin/libmujoco200.so
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin
export MUJOCO_GL=egl # maybe not necessary for you
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
Now conda install everything else:
conda env create -f environment.yml
conda activate extract
pip install -e .
pip install -r requirements.txt
Set the environment variables that specify the root experiment and data directories. For example:
mkdir ./experiments
mkdir ./data
export EXP_DIR=./experiments
export DATA_DIR=./data
Finally, install our D4RL fork:
conda activate extract
pip install -e d4rl
Running our Clustering Pipeline
Franka Kitchen
First, generate the kitchen dataset.
python data_generation_scripts/generate_kitchen_data.py
This replays the actions in the D4RL Franka Kitchen environment to get RGB observations and then generates a corresponding dataset for that.
Now, run the clustering algorithm with default parameters (K-means K=8, Median filtering with a window size of 7):
python vlm_cluster_dataset.py kitchen
LIBERO Dataset
First, download the LIBERO dataset, see full instructions here or follow the below instructions:
git submodule update --init --recursive
pip install -e LIBERO
cd LIBERO
python benchmark_scripts/download_libero_datasets.py
Then, move the downloaded LIBERO datasets to ./datasets/.
Now, generate our version of the LIBERO dataset:
python data_generation_scripts/generate_libero_data_for_clustering.py # for the clustering algorithm
python data_generation_scripts/generate_libero_data_lmdb_lowres.py # for actual policy training
Now, run the clustering algorithm with default parameters (K-means K=8, Median filtering with a window size of 7):
python vlm_cluster_dataset.py libero
Running commands
Training EXTRACT consists of either 2 or 3 stages: (1) Training a skill prior model and low-level decoder w/ a VAE on the offline skill dataset (2) (Optional, used for LIBERO) Fine-tuning the whole VAE on a target-task specific dataset (3) Running online RL to learn new tasks.
(1) Training skill models
All results will be written to WandB. Before running any of the commands below, create an account and then change the WandB entity and project name at the top of train.py and rl/train.py to match your account.
For all commands below, if you want to run with multiple seeds, you can add a --seed {SEED} argument along with --prefix {PREFIX} to distinguish the saved model checkpoint folders from each other (e.g., set prefix to the seed).
Kitchen
To pre-train the skill learning VAE for EXTRACT on kitchen:
python extract/train.py --path=extract/configs/skill_prior_learning/kitchen/hierarchical_cluster/extract --val_data_size=160 --run_name {wandb_run_name}
To pre-train the skill learning VAE for SPiRL on kitchen:
python extract/train.py --path=extract/configs/skill_prior_learning/kitchen/hierarchical/ --val_data_size=160 --run_name {wandb_run_name}
To pre-train a flat BC policy on kitchen:
python extract/train.py --path=extract/configs/skill_prior_learning/kitchen/flat/ --val_data_size=160 --run_name {wandb_run_name}
LIBERO
All pre-training methods will be trained on the LIBERO-90 dataset.
To pre-train the skill learning VAE for EXTRACT on libero:
python extract/train.py --path=extract/configs/skill_prior_learning/libero_lang/hierarchical_cluster/extract --val_data_size=160 --run_name {wandb_run_name}
To pre-train the skill learning VAE for SPiRL on libero:
python extract/train.py --path=extract/configs/skill_prior_learning/libero_lang/hierarchical/ --val_data_size=160 --run_name {wandb_run_name}
To pre-train a flat BC policy on libero:
python extract/train.py --path=extract/configs/skill_prior_learning/libero_lang/flat/ --val_data_size=160 --run_name {wandb_run_name}
NOTE: you can override any config parameter by passing --config_override OVERRIDE_1,OVERRIDE_2,
e.g.: --config_override model.kl-div_weight=1e-4,model.nz_vae=5
(2) Fine-tuning
This is an optional step that we use for LIBERO. If building on EXTRACT, then if you have downstream task demonstrations, then you should write a config to run this step for your custom task/dataset. Otherwise, skip.
LIBERO
Double check to make sure that the experiment config file in the --path argument points to the correct pre-trained skill model checkpoint location. By default it should, but if you moved things around or used a --prefix it may need to be different.
If so, you can manually override by adding a model.ckpt_path={LOCATION} override to the --config_override argument.
To fine-tune the VAE on the LIBERO downstream datasets, you can run the following command with different overrides for each task suite (10 tasks each, 40 tasks total) of spatial, goal, 10, object.
python extract/train.py --path=extract/configs/skill_prior_finetuning/libero_lang/hierarchical_cluster/extract --run_name {wandb_run_name} --config_override {CONFIG_OVERRIDE}
where the config overrides to choose from corresponding to each task suite are:
data.finetune_dataset=spatial,env.task_suite=libero_spatial # LIBERO spatial
data.finetune_dataset=goal,env.task_suite=libero_goal # LIBERO goal
data.finetune_dataset=10,env.task_suite=libero_10 # LIBERO 10
data.finetune_dataset=object,env.task_suite=libero_object # LIBERO object
You can do the same with SPiRL and BC:
SPiRL:
python extract/train.py --path=extract/configs/skill_prior_finetuning/libero_lang/hierarchical/ --run_name {wandb_run_name} --config_override {CONFIG_OVERRIDE}
BC:
python extract/train.py --path=extract/configs/skill_prior_finetuning/libero_lang/flat/ --run_name {wandb_run_name} --config_override {CONFIG_OVERRIDE}
(3) Training RL
Double check to make sure that the experiment config file in the --path argument points to the correct pre-trained skill model location. By default it should, but if you moved things around or used a --prefix it may need to be different.
If so, you can manually override by adding a model.ckpt_path={LOCATION} override to the --config_override argument.
We use the SPiRL codebase's original parallel env implementation with mpirun to run the RL experiments w/ parallel envs. Make sure an MPI implementation is installed on your system (either through the linux package manager or through conda), or don't use the mpirun command if you don't want to use parallel envs.
Kitchen
To train the RL agent with EXTRACT on kitchen:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/hrl/kitchen/cluster/extract --run_name {wandb_run_name}
To train the RL agent with SPiRL on kitchen:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/hrl/kitchen/spirl --run_name {wandb_run_name}
To train the RL agent with BC on kitchen:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/rl/kitchen/prior_initialized/bc_finetune --run_name {wandb_run_name}
To train an SAC policy from scratch on kitchen:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/rl/kitchen/SAC --run_name {wandb_run_name}
LIBERO
To train the RL agent with EXTRACT on libero:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/hrl/libero_lang/cluster/extract --run_name {wandb_run_name} --config_override {CONFIG_OVERRIDE}
where the config overrides to choose from corresponding to each task suite are:
env.task_suite=libero_spatial # LIBERO spatial
env.task_suite=libero_goal # LIBERO goal
env.task_suite=libero_10 # LIBERO 10
env.task_suite=libero_object # LIBERO object
To train the RL agent with SPiRL on libero:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/hrl/libero_lang/spirl --run_name {wandb_run_name} --config_override {CONFIG_OVERRIDE}
To train the RL agent with BC on libero:
mpirun -n 2 python extract/rl/train.py --path=extract/configs/rl/libero_lang/prior_initialized/bc_finetune --run_name {wandb_run_name} --config_override {CONFIG_OVERRIDE}
To train an SAC policy from scratch on libero:
