[CUPID] Curating Data your Robot Loves with Influence Functions 💘

Christopher Agia1, Rohan Sinha1, Jingyun Yang1, Rika Antonova2, Marco Pavone1,3, Haruki Nishimura4, Masha Itkina4, Jeannette Bohg1

1Stanford University, 2University of Cambridge, 3NVIDIA Research, 4Toyota Research Institute

The official code repository for "CUPID: Curating Data your Robot Loves with Influence Functions," accepted to CoRL 2025. For a brief overview of our work, please refer to our project page. Further details can be found in our paper available on arXiv.

Overview 🗺️

This repository implements a four-stage, end-to-end data curation pipeline for robot imitation learning policies, built atop the official diffusion policy codebase:

🏗️ Train initial policies on uncurated data
🤖 Evaluate initial policies and store rollout trajectories
🎯 Run data curation with CUPID and analyze curated data quality
📈 Re-train policies on curated data and analyze performance

What's included? 🗂️

The official implementation of CUPID and core influence function routines for diffusion policies.
Data curation support (both data filtering and data selection) for PushT and RoboMimic environments.
A suite of data curation methods spanning offline and online (i.e., requiring policy evaluation) approaches.

Quick Start 🏁

This repository is tested on Ubuntu 20.04 with Python 3.9.15. Follow the installation procedure below to get setup.

Installation 🛠️

Conda: Python packages are managed through Conda. We recommend using Miniconda with Mamba for faster and more robust package management (as in diffusion policy), and we provide Mamba setup instructions here.

MuJoCo: Next, please follow these instructions to install the original version of mujoco210 for Linux. If you run into trouble, you can try our provided instructions here.

Virtualenv: Finally, create the Python virtual environment using mamba (note: conda might halt):

# Clone repository.
git clone https://github.com/agiachris/cupid.git --recursive
cd cupid

# Create and activate virtualenv.
mamba env create -f conda_environment.yaml
conda activate cupid

# Replace free-mujoco-py with mujoco-py.
pip uninstall free-mujoco-py 
pip install mujoco-py==2.1.2.14

# Login to wandb.
wandb login

Datasets 📂

Instructions for downloading the official diffusion policy training datasets can be found here. See an example for downloading Robomimic low-dimensional "state" datasets below:

mkdir data && cd data
wget https://diffusion-policy.cs.columbia.edu/data/training/robomimic_lowdim.zip
unzip robomimic_lowdim.zip && rm -f robomimic_lowdim.zip && cd ..

The training data will be accessible under data/robomimic/datasets. The corresponding training configs can be found at configs/low_dim.

Curating Performance-Influencing Demonstrations (CUPID)

Running Experiments: Key Details 🔔

Experiments are launched through shell (.sh) scripts. These scripts make it easy to parallelize experiments on a SLURM-managed cluster. Before launching an experiment, you’ll need to update a few key variables in the script:

DEBUG=1 – set to 0 to run the experiment, or 1 to print the Python command without executing it.
SLURM_HOSTNAME="<enter_hostname>" – specify the hostname of your SLURM cluster's submit node.
SLURM_SBATCH_FILE="<enter_sbatch_file>" – specify the path to your SLURM batch submission script.
Additional variables required for specific experiments are documented in the sections below (see 🔎).

💡 Note: If SLURM is not available, the script will default to running jobs sequentially on the local machine.

To serve as an example, all provided shell scripts are pre-configured to run the RoboMimic Lift MH task using the CNN-based Diffusion Policy, and all experiments are repeated over three random seeds per task. You can use these templates and modify them for other tasks or datasets as needed.

Stage 1: Train Policies on Uncurated Data 🏗️

Run the following to train a policy on a random subset of uncurated data. Repeated over three random seeds.

<details> <summary>See key variables 🔎</summary>

Set date="<enter_date>" to the current "train" date. Used to name output training directories.
Option: Configure initial training dataset for demo filtering (Task 1) or demo selection (Task 2) experiments. Please refer to Section 4 of the paper for formal definitions of these two (Task 1 and Task 2) curation settings.
- Set train_filter=1 and train_select=0 to configure the training dataset for demo filtering.
- Set train_filter=0 and train_select=1 to configure the training dataset for demo selection.
💡 Note: All subsequent experiment instructions assume one of the two settings above. To prevent overwriting policy checkpoints, use a different date for demo filtering and demo selection experiments.

</details>

bash scripts/train/train_policies.sh

Training checkpoints will be saved to data/outputs/train.

Stage 2: Evaluate Policies to Collect Rollouts 🤖

Run the following to evaluate the policy and save rollout trajectories. Repeated over three random seeds.

<details> <summary>See key variables 🔎</summary>

Set date="<enter_date>" to the current "eval" date. Used to name output evaluation directories.
Set train_date="<enter_policy_train_date>" to the "train" date set in Stage 1.

</details>

bash scripts/eval/eval_save_episodes.sh

Evaluation results will be saved to data/outputs/eval_save_episodes.

Stage 3: Curate Data with CUPID 🎯

3.1 – Estimate Action Influences

First, compute the influence of each training state-action pair on all test state-action pairs observed in rollouts.

<details> <summary>See key variables 🔎</summary>

Set train_date="<enter_policy_train_date>" to the "train" date set in Stage 1.
Set eval_date="<enter_policy_eval_date>" to the "eval" date set in Stage 2.

</details>

bash scripts/train/train_trak.sh

The resulting action influence scores will be saved to the corresponding policy's evaluation directory.

3.2 – Compute Performance Influences

Next, compute the performance influence of each training demo by aggregating influences of state-action pairs.

<details> <summary>See key variables 🔎</summary>

Set train_date="<enter_policy_train_date>" to the "train" date set in Stage 1.
Set eval_date="<enter_policy_eval_date>" to the "eval" date set in Stage 2.
Set eval_online_trak_influence=1 to enable demonstration scoring based on influence scores.

</details>

bash scripts/eval/eval_demonstration_scores.sh

The resulting performance influence scores will be saved to the corresponding policy's evaluation directory.

3.3 – Generate Re-training Configs

Before re-training the policy on curated data, we need to generate a config file that rank-orders training demos based on their computed scores in Stage 3.2. The notebook notebooks/data_curation.ipynb implements the logic for doing so. Run the cells in Sec 1 of the notebook to get started.

To visualize data quality trends for demo filtering (resp., selection), run cell Sec 2.1 (resp., Sec 2.2).
To generate re-training configs for demo filtering (resp., selection), run cell Sec 3.1 (resp., Sec 3.2).

Configs for re-training the policy on curated data will be saved to configs/curation.

Stage 4: Re-train Policies on Curated Data 📈

Run the following to re-train the policy on curated data; using the re-training config generated in Stage 3.3.

<details> <summary>See key variables 🔎</summary>

Set date="<enter_date>" to the current "retrain" date. Used to name output re-training directories.
The script is pre-configured to filter 10%-90% of the training data and select 0% of the holdout data. You can adjust curation_filter_ratios and

Cupid

Install / Use

README

[CUPID] Curating Data your Robot Loves with Influence Functions 💘

Overview 🗺️

What's included? 🗂️

Quick Start 🏁

Installation 🛠️

Datasets 📂

Curating Performance-Influencing Demonstrations (CUPID)

Running Experiments: Key Details 🔔

Stage 1: Train Policies on Uncurated Data 🏗️

Stage 2: Evaluate Policies to Collect Rollouts 🤖

Stage 3: Curate Data with CUPID 🎯

3.1 – Estimate Action Influences

3.2 – Compute Performance Influences

3.3 – Generate Re-training Configs

Stage 4: Re-train Policies on Curated Data 📈