AstroCLIP
Multimodal contrastive pretraining for astronomical data
Install / Use
/learn @PolymathicAI/AstroCLIPREADME
AstroCLIP
<a href="https://arxiv.org/abs/2310.03024" style='vertical-align:middle; display:inline;'><img src="https://img.shields.io/badge/astro--ph.IM-arXiv%3A2310.03024-B31B1B.svg" class="plain" style="height:25px;" /></a>
Official PyTorch implementation and pre-trained models for the paper AstroCLIP: A Cross-Modal Foundation Model for Galaxies.

AstroCLIP is a novel, cross-modal, self-supervised foundation model that creates a shared embedding space for multi-band imaging and optical spectra of galaxies. These embeddings encode meaningful physical information shared between both modalities, and can be used as the basis for competitive zero- and few-shot learning on a variety of downstream tasks, including similarity search, redshift estimation, galaxy property prediction, and morphology classification.
Tutorial
Check out a collab-native tutorial available on AstroCLIP here: https://colab.research.google.com/github/EiffL/Tutorials/blob/master/FoundationModels/AstroCLIPTutorial_solutions.ipynb
Web App
Check out our interactive similarity search app, enabling both in-modal and cross-modal search for galaxies: https://astroclip.streamlit.app/
Installation
The training and evaluation code requires PyTorch 2.0. Additionally, an up-to-date eventlet is required for wandb. Note that the code has only been tested with the specified versions and also expects a Linux environment. To install the AstroCLIP package and its corresponding dependencies, please follow the code below. The install procedure is unfortunately a bit complex, but execute these lines in a clean python environment should work.
# Setting up proper torch version
pip install --upgrade pip
pip install lightning[extra]==2.3.3 boto3==1.28.17
pip install --upgrade pycairo datasets pyarrow
pip install --extra-index-url https://pypi.nvidia.com cuml-cu11
pip install --extra-index-url https://download.pytorch.org/whl/cu117 torch==2.0.0+cu117
pip install torchvision==0.15.0 torchmetrics==0.10.3 dotenv numpy==1.26.4
# Installing DiNOv2
pip install omegaconf fvcore iopath
pip install --no-deps git+https://github.com/facebookresearch/dinov2.git@2302b6bf46953431b969155307b9bed152754069
# Installing AstroCLIP
pip install astropy datasets huggingface_hub jaxtyping wandb
pip install --no-deps git+https://github.com/PolymathicAI/AstroCLIP.git
NOTE The package provides the three shortcuts: astroclip_trainer and spectrum_trainer, which link to astroclip/trainer.py, and image_trainer, which links to astroclip/astrodino/trainer.py, as long as it is installed. The shortcuts are defined in the project.scripts section of the pyproject.toml file.
Handling roots
The package expects to load models and data by default from
{ASTROCLIP_ROOT}
You can configure ASTROCLIP_ROOT as well as the weights and biases group in which runs are saved by creating a .env file in the root of astroclip with the following content:
ASTROCLIP_ROOT="/mnt/ceph/users/polymathic/astroclip"
WANDB_ENTITY_NAME="flatiron-scipt"
If no environment is specified, the default path at Flatiron will be assumed.
Pretrained Models
We provide the pretrained AstroCLIP model on the Huggingface model hub for easy access. Additionally, we provide the pretrained single-modal models for galaxy images and spectra as well. Model details, checkpoints, configs and logs are below.
<table> <tr> <th>Model Name</th> <th>Pretraining</th> <th># Params.</th> <th colspan="3">Download</th> </tr> <tr> <td>AstroCLIP</td> <td>CLIP</td> <td>370M</td> <td><a href="https://huggingface.co/polymathic-ai/astroclip">ckpt</a></td> <td><a href="https://github.com/PolymathicAI/AstroCLIP/blob/main/configs/astroclip.yaml">config</a></td> <td><a href="https://example.com/link3">logs</a></td> </tr> <tr> <td>Image Encoder</td> <td>DINOv2</td> <td>302M</td> <td><a href="https://huggingface.co/polymathic-ai/astrodino">ckpt</a></td> <td><a href="https://github.com/PolymathicAI/AstroCLIP/blob/main/astroclip/astrodino/config.yaml">config</a></td> <td><a href="https://example.com/link3">logs</a></td> </tr> <tr> <td>Spectrum Encoder</td> <td>Masked Modeling</td> <td>43M</td> <td><a href="https://huggingface.co/polymathic-ai/specformer">ckpt</a></td> <td><a href="https://github.com/PolymathicAI/AstroCLIP/blob/main/configs/specformer.yaml">config</a></td> <td><a href="https://example.com/link3">logs</a></td> </tr> </table>Loading the Pretrained Models
The pretrained AstroCLIP model can be loaded using the following:
from astroclip.models import AstroClipModel
model = AstroClipModel.load_from_checkpoint(
checkpoint_path = "path_to_model.ckpt",
)
High-Level Performance Overview
Below, we include a high-level performance overview of our models on a variety of downstream tasks. This is non-exhaustive, and we refer the reader to the paper for the full details.
<table> <tr> <th>Source</th> <th>Model</th> <th>Type</th> <th>Redshift</th> <th>Properties</th> <th>Morphology</th> </tr> <tr> <td>Image</td> <td>AstroCLIP*</td> <td>Zero-Shot</td> <td>0.79</td> <td>0.47</td> <td>0.76</td> </tr> <tr> <td></td> <td>Image Encoder*</td> <td>Zero-Shot</td> <td>0.63</td> <td>0.37</td> <td>0.78</td> </tr> <tr> <td></td> <td>Stein, et al.</td> <td>Zero-Shot</td> <td>0.36</td> <td>0.26</td> <td>0.76</td> </tr> <tr> <td></td> <td>ResNet18</td> <td>Supervised</td> <td>0.77</td> <td>0.43</td> <td>-</td> </tr> <tr> <td></td> <td>ZooBot<sup>1</sup></td> <td>Supervised</td> <td>-</td> <td>-</td> <td>0.88</td> </tr> <tr> <td>Spectrum</td> <td>AstroCLIP*</td> <td>Zero-Shot</td> <td>0.99</td> <td>0.63</td> <td>-</td> </tr> <tr> <td></td> <td>Spectrum Encoder*</td> <td>Zero-Shot</td> <td>0.99</td> <td>0.64</td> <td>-</td> </tr> <tr> <td></td> <td>Conv+Att<sup>2</sup></td> <td>Supervised</td> <td>0.99</td> <td>0.60</td> <td>-</td> </tr> <tr> <td>Photometry</td> <td>MLP</td> <td>Supervised</td> <td>0.68</td> <td>0.42</td> <td>-</td> </tr> <tr> </table>We report R-squared metrics on redshift and galaxy property estimation (averaged across all properties) and accuracy on galaxy morphology classification (averaged across all labels). Our models are marked with an asterisk (*). [1] We use the results reported from Walmsley, et al. (2021). [2] We use the encoder from Melchior, et al. (2022).
Data Access
The AstroCLIP model is trained on the cross-matched sample containing optical spectra from the Dark Energy Spectroscopic Instrument (DESI) Early Data Release (EDR) and multi-band images (g,r,z) from the DESI Legacy Survey prepared by Stein, et al. (2022). We provide the dataset as a HuggingFace dataset, which can be accessed directly using
from datasets import load_dataset
# This downloads about 60 GB of data
dset = load_dataset('astroclip/data/dataset.py')
For reproducibility, we include the scripts and a brief description of how to generate the cross-matched dataset in astroclip/data/crossmatch.
Image Pretraining Dataset

While the AstroCLIP and Spectrum Encoder models are trained on the image-spectrum dataset, we pretrain the galaxy image model separately on full Stein, et al. (2022) image dataset, which consists of 76M galaxy images. This dataset can be accessed using this globus endpoint:
https://app.globus.org/file-manager?origin_id=9fb0fc0e-e760-11ec-9bd2-2d2219dcc1fa&origin_path=%2F
The directory is organized into south and north surveys, where each survey is split into chunks of 1,000,000 galaxies (sorted by decreasing z-band flux) and saved in hdf5 format. For more details, see here.
Processing Legacy Survey Images Directly
If you want to run AstroCLIP on DESI Legacy Survey cutouts that aren’t part of the premade dataset, you only need two preprocessing steps on the g, r, z bands:
- Center-crop each cutout to 144×144 pixels.
- Convert the cropped 3-band tensor to display-ready RGB using the
decals_to_rgbtransform below (applied on the g,r,z channels).
The function is defined below:
RGB_SCALES = {
"u": (2, 1.5),
"g": (2, 6.0),
"r": (1, 3.4),
"i": (0, 1.0),
"z": (0, 2.2),
}
def decals_to_rgb(image, bands=["g", "r", "z"], scales=None, m=0.03, Q=20.0):
axes, scales = zip(*[RGB_SCALES[bands[i]] for i in range(len(bands))])
scales = [scales[i] for i in axes]
image = image.movedim(1, -1).flip(-1)
scales = torch.tensor(scales, dtype=torch.float32).to(image.device)
I = torch.sum(torch.clamp(image * scales + m, min=0), dim=-1) / len(bands)
fI = torch.arcsinh(Q * I) / np.sqrt(Q)
I += (I == 0.0) * 1e-6
image = (image * scales + m) * (fI / I).unsqueeze(-1)
image = torch.clamp(image, 0, 1)
return image.movedim(-1, 1)
which expects just the g,r,z bands from the Legacy Survey image.
Pretraining
AstroCLIP is trained using a two-step process:
- We pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately.
- We CLIP-align these two encoders on a paired image-spectrum dataset.
Single-Modal Pretraining
Image Pretraining - DINOv2 ViT:
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the DINOv2 package
