SkillAgentSearch skills...

Sparsh

Sparsh Self-supervised touch representations for vision-based tactile sensing

Install / Use

/learn @facebookresearch/Sparsh
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Sparsh: Self-supervised touch representations for vision-based tactile sensing

<p align="center"> Carolina Higuera<sup>*</sup>, Akash Sharma<sup>*</sup>, Chaithanya Krishna Bodduluri, Taosha Fan, Patrick Lancaster, Mrinal Kalakrishnan, Michael Kaess, Byron Boots, Mike Lambeta, Tingfan Wu, Mustafa Mukadam </p> <p align="center"> <sup>*</sup>Equal contribution </p> <p align="center"> <a href=https://ai.facebook.com/research/ai-systems>AI at Meta, FAIR</a>; <a href=https://ri.cmu.edu/>The Robotics Institute, CMU</a>; <a href=https://www.washington.edu/>University of Washington</a> </p> <p align="center"> <a href="https://ai.facebook.com/research/publications/sparsh-self-supervised-touch-representations-for-vision-based-tactile-sensing"><img src="http://img.shields.io/badge/Paper-PDF-red.svg"></img></a> <a href="https://arxiv.org/abs/2410.24090"><img src="https://img.shields.io/badge/arXiv-2410.24090-b31b1b.svg"></img></a> <a href="https://sparsh-ssl.github.io"><img src="http://img.shields.io/badge/Project-Page-blue.svg"></img></a> <a href="https://youtu.be/8q2BI5HePq0"><img src="http://img.shields.io/badge/Video-Link-green.svg"></img></a> <a href="https://huggingface.co/collections/facebook/sparsh-67167ce57566196a4526c328"><img src="https://img.shields.io/badge/Models%20and%20datasets-Link-yellow?logo=huggingface"></img></a> <a href="#-citing-sparsh"><img src="http://img.shields.io/badge/Cite-Us-orange.svg"></img></a> </p> <p align="center"> <img src="./assets/teaser.png" alt="drawing" width="700"/> </p> Sparsh is a family of general touch representations trained via self-supervision algorithms such as MAE, DINO and JEPA. Sparsh is able to generate useful representations for DIGIT, Gelsight'17 and Gelsight Mini. It outperforms end-to-end models in the downstream tasks proposed in TacBench by a large margin, and can enable data efficient training for new downstream tasks.

This repository contains the pytorch implementation, pre-trained models, and datasets released with Sparsh.

<p align="center"> <img src="assets/tacbench.gif" alt="animated" /> </p>

🛠️Installation and setup

Clone this repository:

git clone https://github.com/facebookresearch/sparsh.git
cd sparsh

and create a conda environment with dependencies:

mamba env create -f environment.yml
mamba activate tactile_ssl

🚀 Pretrained models

Pretrained model weights are available for download from our Hugging Face: facebook/sparsh

<table style="margin: auto"> <thead> <tr> <th>model</th> <th>small</th> <th>base</th> </tr> </thead> <tbody> <tr> <td>Sparsh (MAE)</td> <td><a href="https://huggingface.co/facebook/sparsh-mae-small">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-mae-base">backbone only</a></td> </tr> <tr> <td>Sparsh (DINO)</td> <td><a href="https://huggingface.co/facebook/sparsh-dino-small/">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-dino-base/">backbone</a></td> </tr> <tr> <td>Sparsh (DINOv2)</td> <td>:x:</td> <td><a href="https://huggingface.co/facebook/sparsh-dinov2-base/">backbone</a></td> </tr> <tr> <td>Sparsh (IJEPA)</td> <td><a href="https://huggingface.co/facebook/sparsh-ijepa-small/">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-ijepa-base/">backbone</a></td> </tr> <tr> <td>Sparsh (VJEPA)</td> <td><a href="https://huggingface.co/facebook/sparsh-vjepa-small/">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-vjepa-base/">backbone</a></td> </tr> </tbody> </table>

📥 Datasets

Pretraining datasets

For pretraining, we curate datasets from multiple sources containing unlabeled data from DIGIT and GelSight sensors.

For DIGIT, the dataset is a mixture of the YCB-Slide dataset and in-house collected data: Touch-Slide. It contains approximately 360k samples of tactile images with a diverse set of no-contact images or backgrounds.

For GelSight, we use open source datasets available online, Touch and Go and ObjectFolder-Real.

<!-- - [ ] @Carolina: Update with correct scripts -->

DIGIT

To download the dataset, please edit path_dataset in the bash script scripts/download_digitv1_dataset.sh. This will download and extract the data in path_dataset for both YCB-Slide and Touch-Slide datasets.

The structure of the dataset is:

digitv1/Object-Slide
├── object_0 # eg: 004_sugar_box
│   ├── dataset_0.pkl
    ├── dataset_1.pkl
    ├── dataset_2.pkl
    ├── dataset_3.pkl
    ├── dataset_4.pkl
├── object_1 # eg: bread
...
├── bgs
    ├── bg_0.jpg
    ...
    ├── bg_18.jpg

In the bgs/ folder there are images of several no-contact images or backgrounds from different DIGIT sensors. This is necessary for pre-processing the data. Please add to these folder background images from your sensor in case you're adding new tactile data.

To load this dataset, use tactile_ssl/data/vision_tactile.py

GelSight dataset

We use Touch and Go to pretrain on GelSight'17 (with markers). The dataset consists of short videoclips making contact with in-the-wild objects. We use all frames from those videoclips, including no-contact frames. We do not perform any preprocessing since the markers contain relevant static shear information.

We also use sequences from the ObjectFolder-Real dataset for pre-training. We preprocess the data by extracting only the tactile images (GelSight Mini), as we do not use the other modalities.

We provide a script to download the preprocessed and compatible version of these datasets with our pipeline. To do so, run the bash script scripts/download_gelsight_dataset.sh. This will download and extract the data. Don't forget to edit path_dataset in the script.

The structure of the dataset is:

gelsight/touch_go
    ├── 20220410_031604.pkl
    ├── 20220410_031843.pkl
    ├── ...
    ├── 20220607_133934.pkl
gelsight/object_folder
    ├── 001.pkl
    ├── 002.pkl
    ...
    ├── 051.pkl

If you would like to download the data directly from Touch and Go and ObjectFolder-Real, you can also do so. Data can be downloaded by running the bash scripts scripts/download_datasets_scratch/download_gelsight_object_folder.sh and scripts/download_datasets_scratch/download_gelsight_touchgo.sh. Then, you can process the data to make it compatible with our pipeline by running the Python scripts scripts/download_datasets_scratch/compress_object_folder.py and scripts/download_datasets_scratch/compress_touch_go.py. Please modify the corresponding paths in all scripts accordingly.

To load this dataset, use tactile_ssl/data/vision_tactile.py

Downstream task datasets

We open-source the data that we collected in-house for force estimation, slip detection and pose estimation downstream tasks. The datasets can be downloaded from the Sparsh collection in Hugging Face:

Please locate these datasets in a directory designated for hosting all downstream task datasets.

T1 Force estimation and T2 slip detection

This dataset contains paired tactile and force data, intended for use in predicting 3-axis normal and shear forces applied to the sensor's elastomer. We used three different indenter shapes to collect force-labeled data: hemisphere, sharp, and flat. To measure force ground truths, we employed the ATI nano17 force/torque sensor. The protocol consisted of applying a random normal load followed by a shear load, achieved by sliding the probe 2mm on the sensor's elastomer.

The dataset consists a collection of normal/shear load trajectories for each probe. The structure is as follows (example for DIGIT dataset):

T1_force/digit/sphere
├── batch_1
│   ├── dataset_digit_00.pkl
│   ├── ...
│   ├── dataset_digit_03.pkl
│   ├── dataset_slip_forces.pkl
├── batch_2
│   ├── ...
T1_force/digit/flat
├── batch_1
│   ├── dataset_digit_00.pkl
│   ├── ...
│   ├── dataset_digit_03.pkl
│   ├── dataset_slip_forces.pkl
│   ...
T1_force/digit/sharp
├── ....

For each batch:

  • dataset_digit_xy.pkl: contains the binarized tactile images only.
  • dataset_slip_forces.pkl: it's a dictionary where each key represents a sliding trajectory. Each trajectory has the corresponding force and slip labels.

To load this dataset (DIGIT and GelSight Mini), use tactile_ssl/data/vision_based_forces_slip_probes.py

T3 Pose estimation

This dataset contains time-synchronized pairs of DIGIT images and SE(3) object poses. In our setup, the robot hand is stationary with its palm facing downwards and pressing against the object on a table. The robot hand has DIGIT sensors mounted on the index, middle, and ring fingertips, all of which are in contact with the object. A human manually perturbs the object's pose by translating and rotating it in SE(2). We use tag tracking to obtain the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box, both of which have a tag fixed to their top surfaces.

The dataset is a collection of sequences where a human manually perturbs the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box. Each sequence corresponds to a pickle file containing the following labeled data:

  • DIGIT tactile

Related Skills

View on GitHub
GitHub Stars208
CategoryDevelopment
Updated8d ago
Forks22

Languages

Jupyter Notebook

Security Score

80/100

Audited on Mar 23, 2026

No findings