Thingsvision
Python package for extracting representations from state-of-the-art computer vision models
Install / Use
/learn @ViCCo-Group/ThingsvisionREADME
<a name="readme-top"></a>
<div align="center"> <a href="https://github.com/ViCCo-Group/thingsvision/actions/workflows/tests.yml" rel="nofollow"> <img src="https://github.com/ViCCo-Group/thingsvision/actions/workflows/tests.yml/badge.svg" alt="Tests" /> </a> <a href="https://github.com/ViCCo-Group/thingsvision/actions/workflows/coverage.yml" rel="nofollow"> <img src="https://codecov.io/gh/ViCCo-Group/thingsvision/branch/master/graph/badge.svg" alt="Code Coverage" /> </a> <a href="https://gist.github.com/cheerfulstoic/d107229326a01ff0f333a1d3476e068d" rel="nofollow"> <img src="https://img.shields.io/badge/maintenance-yes-brightgreen.svg" alt="Maintenance" /> </a> <a href="https://pypi.org/project/thingsvision/" rel="nofollow"> <img src="https://img.shields.io/pypi/v/thingsvision" alt="PyPI" /> </a> <a href="https://pepy.tech/project/thingsvision"> <img src="https://img.shields.io/pypi/dm/thingsvision" alt="downloads"> </a> <a href="https://www.python.org/" rel="nofollow"> <img src="https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue.svg" alt="Python version" /> </a> <a href="https://github.com/ViCCo-Group/thingsvision/blob/master/LICENSE" rel="nofollow"> <img src="https://img.shields.io/pypi/l/thingsvision" alt="License" /> </a> <a href="https://github.com/psf/black" rel="nofollow"> <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" /> </a> <a href="https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/pytorch.ipynb" rel="nofollow"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /> </a> </div> <br /> <!-- Table of Contents -->:notebook_with_decorative_cover: Table of Contents
<!-- About the Project -->:star2: About the Project
thingsvision is a Python package for extracting (image) representations from many state-of-the-art computer vision models. Essentially, you provide thingsvision with a directory of images and specify the neural network you're interested in. Subsequently, thingsvision returns the representation of the selected neural network for each image, resulting in one feature map (vector or matrix, depending on the layer) per image. These features, used interchangeably with image representations, can then be used for further analyses.
:rotating_light: NOTE: some function calls mentioned in the original paper have been deprecated. To use this package successfully, exclusively follow this README and the documentation! :rotating_light:
:mechanical_arm: Functionality
With thingsvision, you can:
- extract features for any imageset from many popular networks.
- extract features for any imageset from your custom networks.
- extract features for >26,000 images from the THINGS image database.
- align the extracted features with human object perception (e.g., using gLocal).
- extract features from HDF5 datasets directly (e.g., NSD stimuli)
- conduct basic Representational Similarity Analysis (RSA) after feature extraction.
- perform efficient Centered Kernel Alignment (CKA) to compare image features across model-module combinations.
:file_cabinet: Model collection
Neural networks come from different sources. With thingsvision, you can extract image representations of all models from:
- torchvision
- Keras
- timm
ssl(self-supervised learning models)simclr-rn50,mocov2-rn50,barlowtwins-rn50,pirl-rn50jigsaw-rn50,rotnet-rn50,swav-rn50,vicreg-rn50dino-rn50,dino-xcit-{small/medium}-{12/24}-p{8/16}dino-vit-{tiny/small/base}-p{8/16}dinov2-vit-{small/base/large/giant}-p14mae-vit-{base/large}-p16,mae-vit-huge-p14<br>
- OpenCLIP models (CLIP trained on LAION-{400M/2B/5B})
- CLIP models (CLIP trained on WiT)
- a few custom models (Alexnet, VGG-16, Resnet50, and Inception_v3) trained on Ecoset<br>
- CORnet models (recurrent vision models)
- Harmonization models (see Harmonization repo). The default variant is
ViT_B16. Other available models areResNet50,VGG16,EfficientNetB0,tiny_ConvNeXT,tiny_MaxViT, andLeViT_small<br> - DreamSim models (see DreamSim repo). The default variant is
open_clip_vitb32. Other available models areclip_vitb32,dino_vitb16, and anensemble. See the docs for more information - FAIR's Segment Anything (SAM) model
- Kakaobrain's ALIGN implementation
:running: Getting Started
<!-- Setting up your environment -->:computer: Setting up your environment
Working locally
First, create a new conda environment with Python version 3.10, 3.11, or 3.12 e.g. by using conda:
$ conda create -n thingsvision python=3.10
$ conda activate thingsvision
Then, activate the environment and simply install thingsvision via running the following pip command in your terminal.
$ pip install --upgrade thingsvision
$ pip install git+https://github.com/openai/CLIP.git
If you want to extract features for harmonized models from the Harmonization repo, you have to additionally run the following pip command in your thingsvision environment,
$ pip install "keras-cv-attention-models>=1.3.5" "vit-keras==0.1.2"
$ pip install git+https://github.com/serre-lab/Harmonization.git
If you want to extract features for DreamSim from the DreamSim repo, you have to additionally run the following pip command in your thingsvision environment,
$ pip install dreamsim==0.1.3
See the docs for which DreamSim models are available in thingsvision.
Google Colab
Alternatively, you can use Google Colab to play around with thingsvision by uploading your image data to Google Drive (via directory mounting).
You can find the jupyter notebook using PyTorch here and the TensorFlow example here.
:mag: Basic usage
Command Line Interface (CLI)
thingsvision was designed to simplify feature extraction. If you have some folder of images (e.g., ./images) and want to extract features for each of these images without opening a Jupyter Notebook instance or writing a Python script, it's probably easiest to use our CLI. The interface includes two options,
thingsvision show-modelthingsvision extract-features
Example calls might look as follows:
thingsvision show-model --model-name "alexnet" --source "torchvision"
thingsvision extract-features --image-root "./data" --model-name "alexnet" --module-name "features.10" --batch-size 32 --device "cuda" --source "torchvision" --file-format "npy" --out-path "./features"
See thingsvision show-model -h and thingsvision extract-features -h for a list of all possible arguments. Note that the CLI provides just the basic extraction functionalities but is probably enough for most users that don't want to dive too deep into various models and modules. If you need more fine-grained control over the extraction itself, we recommend to use the python package directly and write your own Python script.
Python commands
To do this start by importing all the necessary components and instantiating a thingsvision extractor. Here we're using CLIP from the official clip repo as the
