Sparsh

Sparsh Self-supervised touch representations for vision-based tactile sensing

Generate Convert Improve

Install / Use

/learn @facebookresearch/Sparsh

About this skill

Quality Score

0/100

README

Sparsh: Self-supervised touch representations for vision-based tactile sensing

<p align="center"> Carolina Higuera<sup>*</sup>, Akash Sharma<sup>*</sup>, Chaithanya Krishna Bodduluri, Taosha Fan, Patrick Lancaster, Mrinal Kalakrishnan, Michael Kaess, Byron Boots, Mike Lambeta, Tingfan Wu, Mustafa Mukadam </p> <p align="center"> <sup>*</sup>Equal contribution </p> <p align="center"> <a href=https://ai.facebook.com/research/ai-systems>AI at Meta, FAIR</a>; <a href=https://ri.cmu.edu/>The Robotics Institute, CMU</a>; <a href=https://www.washington.edu/>University of Washington</a> </p> <p align="center"> <a href="https://ai.facebook.com/research/publications/sparsh-self-supervised-touch-representations-for-vision-based-tactile-sensing"><img src="http://img.shields.io/badge/Paper-PDF-red.svg"></img></a> <a href="https://arxiv.org/abs/2410.24090"><img src="https://img.shields.io/badge/arXiv-2410.24090-b31b1b.svg"></img></a> <a href="https://sparsh-ssl.github.io"><img src="http://img.shields.io/badge/Project-Page-blue.svg"></img></a> <a href="https://youtu.be/8q2BI5HePq0"><img src="http://img.shields.io/badge/Video-Link-green.svg"></img></a> <a href="https://huggingface.co/collections/facebook/sparsh-67167ce57566196a4526c328"><img src="https://img.shields.io/badge/Models%20and%20datasets-Link-yellow?logo=huggingface"></img></a> <a href="#-citing-sparsh"><img src="http://img.shields.io/badge/Cite-Us-orange.svg"></img></a> </p> <p align="center"> <img src="./assets/teaser.png" alt="drawing" width="700"/> </p> Sparsh is a family of general touch representations trained via self-supervision algorithms such as MAE, DINO and JEPA. Sparsh is able to generate useful representations for DIGIT, Gelsight'17 and Gelsight Mini. It outperforms end-to-end models in the downstream tasks proposed in TacBench by a large margin, and can enable data efficient training for new downstream tasks.

This repository contains the pytorch implementation, pre-trained models, and datasets released with Sparsh.

🛠️Installation and setup

Clone this repository:

git clone https://github.com/facebookresearch/sparsh.git
cd sparsh

and create a conda environment with dependencies:

mamba env create -f environment.yml
mamba activate tactile_ssl

🚀 Pretrained models

Pretrained model weights are available for download from our Hugging Face: facebook/sparsh

<table style="margin: auto"> <thead> <tr> <th>model</th> <th>small</th> <th>base</th> </tr> </thead> <tbody> <tr> <td>Sparsh (MAE)</td> <td><a href="https://huggingface.co/facebook/sparsh-mae-small">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-mae-base">backbone only</a></td> </tr> <tr> <td>Sparsh (DINO)</td> <td><a href="https://huggingface.co/facebook/sparsh-dino-small/">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-dino-base/">backbone</a></td> </tr> <tr> <td>Sparsh (DINOv2)</td> <td>:x:</td> <td><a href="https://huggingface.co/facebook/sparsh-dinov2-base/">backbone</a></td> </tr> <tr> <td>Sparsh (IJEPA)</td> <td><a href="https://huggingface.co/facebook/sparsh-ijepa-small/">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-ijepa-base/">backbone</a></td> </tr> <tr> <td>Sparsh (VJEPA)</td> <td><a href="https://huggingface.co/facebook/sparsh-vjepa-small/">backbone</a></td> <td><a href="https://huggingface.co/facebook/sparsh-vjepa-base/">backbone</a></td> </tr> </tbody> </table>

📥 Datasets

Pretraining datasets

For pretraining, we curate datasets from multiple sources containing unlabeled data from DIGIT and GelSight sensors.

For DIGIT, the dataset is a mixture of the YCB-Slide dataset and in-house collected data: Touch-Slide. It contains approximately 360k samples of tactile images with a diverse set of no-contact images or backgrounds.

For GelSight, we use open source datasets available online, Touch and Go and ObjectFolder-Real.

DIGIT

To download the dataset, please edit path_dataset in the bash script scripts/download_digitv1_dataset.sh. This will download and extract the data in path_dataset for both YCB-Slide and Touch-Slide datasets.

The structure of the dataset is:

digitv1/Object-Slide
├── object_0 # eg: 004_sugar_box
│   ├── dataset_0.pkl
    ├── dataset_1.pkl
    ├── dataset_2.pkl
    ├── dataset_3.pkl
    ├── dataset_4.pkl
├── object_1 # eg: bread
...
├── bgs
    ├── bg_0.jpg
    ...
    ├── bg_18.jpg

In the bgs/ folder there are images of several no-contact images or backgrounds from different DIGIT sensors. This is necessary for pre-processing the data. Please add to these folder background images from your sensor in case you're adding new tactile data.

To load this dataset, use tactile_ssl/data/vision_tactile.py

GelSight dataset

We use Touch and Go to pretrain on GelSight'17 (with markers). The dataset consists of short videoclips making contact with in-the-wild objects. We use all frames from those videoclips, including no-contact frames. We do not perform any preprocessing since the markers contain relevant static shear information.

We also use sequences from the ObjectFolder-Real dataset for pre-training. We preprocess the data by extracting only the tactile images (GelSight Mini), as we do not use the other modalities.

We provide a script to download the preprocessed and compatible version of these datasets with our pipeline. To do so, run the bash script scripts/download_gelsight_dataset.sh. This will download and extract the data. Don't forget to edit path_dataset in the script.

The structure of the dataset is:

gelsight/touch_go
    ├── 20220410_031604.pkl
    ├── 20220410_031843.pkl
    ├── ...
    ├── 20220607_133934.pkl
gelsight/object_folder
    ├── 001.pkl
    ├── 002.pkl
    ...
    ├── 051.pkl

If you would like to download the data directly from Touch and Go and ObjectFolder-Real, you can also do so. Data can be downloaded by running the bash scripts scripts/download_datasets_scratch/download_gelsight_object_folder.sh and scripts/download_datasets_scratch/download_gelsight_touchgo.sh. Then, you can process the data to make it compatible with our pipeline by running the Python scripts scripts/download_datasets_scratch/compress_object_folder.py and scripts/download_datasets_scratch/compress_touch_go.py. Please modify the corresponding paths in all scripts accordingly.

To load this dataset, use tactile_ssl/data/vision_tactile.py

Downstream task datasets

We open-source the data that we collected in-house for force estimation, slip detection and pose estimation downstream tasks. The datasets can be downloaded from the Sparsh collection in Hugging Face:

Force estimation and Slip detection: DIGIT, GelSight Mini
Pose estimation: DIGIT

Please locate these datasets in a directory designated for hosting all downstream task datasets.

T1 Force estimation and T2 slip detection

This dataset contains paired tactile and force data, intended for use in predicting 3-axis normal and shear forces applied to the sensor's elastomer. We used three different indenter shapes to collect force-labeled data: hemisphere, sharp, and flat. To measure force ground truths, we employed the ATI nano17 force/torque sensor. The protocol consisted of applying a random normal load followed by a shear load, achieved by sliding the probe 2mm on the sensor's elastomer.

The dataset consists a collection of normal/shear load trajectories for each probe. The structure is as follows (example for DIGIT dataset):

T1_force/digit/sphere
├── batch_1
│   ├── dataset_digit_00.pkl
│   ├── ...
│   ├── dataset_digit_03.pkl
│   ├── dataset_slip_forces.pkl
├── batch_2
│   ├── ...
T1_force/digit/flat
├── batch_1
│   ├── dataset_digit_00.pkl
│   ├── ...
│   ├── dataset_digit_03.pkl
│   ├── dataset_slip_forces.pkl
│   ...
T1_force/digit/sharp
├── ....

For each batch:

dataset_digit_xy.pkl: contains the binarized tactile images only.
dataset_slip_forces.pkl: it's a dictionary where each key represents a sliding trajectory. Each trajectory has the corresponding force and slip labels.

To load this dataset (DIGIT and GelSight Mini), use tactile_ssl/data/vision_based_forces_slip_probes.py

T3 Pose estimation

This dataset contains time-synchronized pairs of DIGIT images and SE(3) object poses. In our setup, the robot hand is stationary with its palm facing downwards and pressing against the object on a table. The robot hand has DIGIT sensors mounted on the index, middle, and ring fingertips, all of which are in contact with the object. A human manually perturbs the object's pose by translating and rotating it in SE(2). We use tag tracking to obtain the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box, both of which have a tag fixed to their top surfaces.

The dataset is a collection of sequences where a human manually perturbs the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box. Each sequence corresponds to a pickle file containing the following labeled data:

DIGIT tactile

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。