Cupid
Official repository for "CUPID: Curating Data your Robot Loves with Influence Functions," accepted to CoRL 2025.
Install / Use
/learn @agiachris/CupidREADME
[CUPID] Curating Data your Robot Loves with Influence Functions 💘
Christopher Agia<sup>1</sup>, Rohan Sinha<sup>1</sup>, Jingyun Yang<sup>1</sup>, Rika Antonova<sup>2</sup>, Marco Pavone<sup>1,3</sup>, Haruki Nishimura<sup>4</sup>, Masha Itkina<sup>4</sup>, Jeannette Bohg<sup>1</sup>
<sup>1</sup>Stanford University, <sup>2</sup>University of Cambridge, <sup>3</sup>NVIDIA Research, <sup>4</sup>Toyota Research Institute
<a href='https://cupid-curation.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2506.19121'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
The official code repository for "CUPID: Curating Data your Robot Loves with Influence Functions," accepted to CoRL 2025. For a brief overview of our work, please refer to our project page. Further details can be found in our paper available on arXiv.
<img src="readme/media/cupid-preview.png" alt="CUPID Preview"/>Overview 🗺️
This repository implements a four-stage, end-to-end data curation pipeline for robot imitation learning policies, built atop the official diffusion policy codebase:
- 🏗️ Train initial policies on uncurated data
- 🤖 Evaluate initial policies and store rollout trajectories
- 🎯 Run data curation with CUPID and analyze curated data quality
- 📈 Re-train policies on curated data and analyze performance
What's included? 🗂️
- The official implementation of CUPID and core influence function routines for diffusion policies.
- Data curation support (both data filtering and data selection) for PushT and RoboMimic environments.
- A suite of data curation methods spanning offline and online (i.e., requiring policy evaluation) approaches.
Quick Start 🏁
This repository is tested on Ubuntu 20.04 with Python 3.9.15. Follow the installation procedure below to get setup.
Installation 🛠️
Conda: Python packages are managed through Conda. We recommend using Miniconda with Mamba for faster and more robust package management (as in diffusion policy), and we provide Mamba setup instructions here.
MuJoCo: Next, please follow these instructions to install the original version of mujoco210 for Linux. If you run into trouble, you can try our provided instructions here.
Virtualenv: Finally, create the Python virtual environment using mamba (note: conda might halt):
# Clone repository.
git clone https://github.com/agiachris/cupid.git --recursive
cd cupid
# Create and activate virtualenv.
mamba env create -f conda_environment.yaml
conda activate cupid
# Replace free-mujoco-py with mujoco-py.
pip uninstall free-mujoco-py
pip install mujoco-py==2.1.2.14
# Login to wandb.
wandb login
Datasets 📂
Instructions for downloading the official diffusion policy training datasets can be found here. See an example for downloading Robomimic low-dimensional "state" datasets below:
mkdir data && cd data
wget https://diffusion-policy.cs.columbia.edu/data/training/robomimic_lowdim.zip
unzip robomimic_lowdim.zip && rm -f robomimic_lowdim.zip && cd ..
The training data will be accessible under data/robomimic/datasets. The corresponding training configs can be found at configs/low_dim.
Curating Performance-Influencing Demonstrations (CUPID)
Running Experiments: Key Details 🔔
Experiments are launched through shell (.sh) scripts. These scripts make it easy to parallelize experiments on a SLURM-managed cluster. Before launching an experiment, you’ll need to update a few key variables in the script:
DEBUG=1– set to 0 to run the experiment, or 1 to print the Python command without executing it.SLURM_HOSTNAME="<enter_hostname>"– specify the hostname of your SLURM cluster's submit node.SLURM_SBATCH_FILE="<enter_sbatch_file>"– specify the path to your SLURM batch submission script.- Additional variables required for specific experiments are documented in the sections below (see 🔎).
💡 Note: If SLURM is not available, the script will default to running jobs sequentially on the local machine.
To serve as an example, all provided shell scripts are pre-configured to run the RoboMimic Lift MH task using the CNN-based Diffusion Policy, and all experiments are repeated over three random seeds per task. You can use these templates and modify them for other tasks or datasets as needed.
Stage 1: Train Policies on Uncurated Data 🏗️
Run the following to train a policy on a random subset of uncurated data. Repeated over three random seeds.
<details> <summary>See key variables 🔎</summary>- Set
date="<enter_date>"to the current "train" date. Used to name output training directories. - Option: Configure initial training dataset for demo filtering (Task 1) or demo selection (Task 2) experiments. Please refer to Section 4 of the paper for formal definitions of these two (Task 1 and Task 2) curation settings.
- Set
train_filter=1andtrain_select=0to configure the training dataset for demo filtering. - Set
train_filter=0andtrain_select=1to configure the training dataset for demo selection.
💡 Note: All subsequent experiment instructions assume one of the two settings above. To prevent overwriting policy checkpoints, use a different
datefor demo filtering and demo selection experiments. - Set
bash scripts/train/train_policies.sh
Training checkpoints will be saved to data/outputs/train.
Stage 2: Evaluate Policies to Collect Rollouts 🤖
Run the following to evaluate the policy and save rollout trajectories. Repeated over three random seeds.
<details> <summary>See key variables 🔎</summary>- Set
date="<enter_date>"to the current "eval" date. Used to name output evaluation directories. - Set
train_date="<enter_policy_train_date>"to the "train" date set in Stage 1.
bash scripts/eval/eval_save_episodes.sh
Evaluation results will be saved to data/outputs/eval_save_episodes.
Stage 3: Curate Data with CUPID 🎯
3.1 – Estimate Action Influences
First, compute the influence of each training state-action pair on all test state-action pairs observed in rollouts.
<details> <summary>See key variables 🔎</summary>- Set
train_date="<enter_policy_train_date>"to the "train" date set in Stage 1. - Set
eval_date="<enter_policy_eval_date>"to the "eval" date set in Stage 2.
bash scripts/train/train_trak.sh
The resulting action influence scores will be saved to the corresponding policy's evaluation directory.
3.2 – Compute Performance Influences
Next, compute the performance influence of each training demo by aggregating influences of state-action pairs.
<details> <summary>See key variables 🔎</summary>- Set
train_date="<enter_policy_train_date>"to the "train" date set in Stage 1. - Set
eval_date="<enter_policy_eval_date>"to the "eval" date set in Stage 2. - Set
eval_online_trak_influence=1to enable demonstration scoring based on influence scores.
bash scripts/eval/eval_demonstration_scores.sh
The resulting performance influence scores will be saved to the corresponding policy's evaluation directory.
3.3 – Generate Re-training Configs
Before re-training the policy on curated data, we need to generate a config file that rank-orders training demos based on their computed scores in Stage 3.2. The notebook notebooks/data_curation.ipynb implements the logic for doing so. Run the cells in Sec 1 of the notebook to get started.
- To visualize data quality trends for demo filtering (resp., selection), run cell
Sec 2.1(resp.,Sec 2.2). - To generate re-training configs for demo filtering (resp., selection), run cell
Sec 3.1(resp.,Sec 3.2).
Configs for re-training the policy on curated data will be saved to configs/curation.
Stage 4: Re-train Policies on Curated Data 📈
Run the following to re-train the policy on curated data; using the re-training config generated in Stage 3.3.
<details> <summary>See key variables 🔎</summary>- Set
date="<enter_date>"to the current "retrain" date. Used to name output re-training directories. - The script is pre-configured to filter 10%-90% of the training data and select 0% of the holdout data. You can adjust
curation_filter_ratiosand
