</h4>

MatterGen is a generative model for inorganic materials design across the periodic table that can be fine-tuned to steer the generation towards a wide range of property constraints.

Installation
Get started with a pre-trained model
Generating materials
Evaluation
Train MatterGen yourself
Data release
Citation
Trademarks
Responsible AI Transparency Documentation
Get in touch

Installation

The easiest way to install prerequisites is via uv, a fast Python package and project manager.

The MatterGen environment can be installed via the following command (assumes you are running Linux and have a CUDA GPU):

pip install uv
uv venv .venv --python 3.10 
source .venv/bin/activate
uv pip install -e .

Note that our datasets and model checkpoints are provided inside this repo via Git Large File Storage (LFS). To find out whether LFS is installed on your machine, run

git lfs --version

If this prints some version like git-lfs/3.0.2 (GitHub; linux amd64; go 1.18.1), you can skip the following step.

Install Git LFS

If Git LFS was not installed before you cloned this repo, you can install it via:

sudo apt install git-lfs
git lfs install

Apple Silicon

[!WARNING] Running MatterGen on Apple Silicon is experimental. Use at your own risk.
Further, you need to run export PYTORCH_ENABLE_MPS_FALLBACK=1 before any training or generation run.

Get started with a pre-trained model

We provide checkpoints of an unconditional base version of MatterGen as well as fine-tuned models for these properties:

mattergen_base: unconditional base model trained on Alex-MP-20
mp_20_base: unconditional base model trained on MP-20
chemical_system: fine-tuned model conditioned on chemical system
space_group: fine-tuned model conditioned on space group
dft_mag_density: fine-tuned model conditioned on magnetic density from DFT
dft_band_gap: fine-tuned model conditioned on band gap from DFT
ml_bulk_modulus: fine-tuned model conditioned on bulk modulus from ML predictor
dft_mag_density_hhi_score: fine-tuned model jointly conditioned on magnetic density from DFT and HHI score
chemical_system_energy_above_hull: fine-tuned model jointly conditioned on chemical system and energy above hull from DFT

The checkpoints are located at checkpoints/<model_name> and are also available on Hugging Face. By default, they are downloaded from Huggingface when requested. You can also manually download them from Git LFS via

git lfs pull -I checkpoints/<model_name> --exclude=""

[!NOTE] The checkpoints provided were re-trained using this repository, i.e., are not identical to the ones used in the paper. Hence, results may slightly deviate from those in the publication.

Generating materials

Unconditional generation

To sample from the pre-trained base model, run the following command.

export MODEL_NAME=mattergen_base
export RESULTS_PATH=results/  # Samples will be written to this directory

# generate batch_size * num_batches samples
mattergen-generate $RESULTS_PATH --pretrained-name=$MODEL_NAME --batch_size=16 --num_batches 1

This script will write the following files into $RESULTS_PATH:

generated_crystals_cif.zip: a ZIP file containing a single .cif file per generated structure.
generated_crystals.extxyz, a single file containing the individual generated structures as frames.
If --record-trajectories == True (default): generated_trajectories.zip: a ZIP file containing a .extxyz file per generated structure, which contains the full denoising trajectory for each individual structure.

[!TIP] For best efficiency, increase the batch size to the largest your GPU can sustain without running out of memory.

[!NOTE] To sample from a model you've trained yourself, replace --pretrained-name=$MODEL_NAME with --model_path=$MODEL_PATH, filling in your model's location for $MODEL_PATH.

Property-conditioned generation

With a fine-tuned model, you can generate materials conditioned on a target property. For example, to sample from the model trained on magnetic density, you can run the following command.

export MODEL_NAME=dft_mag_density
export RESULTS_PATH="results/$MODEL_NAME/"  # Samples will be written to this directory, e.g., `results/dft_mag_density`

# Generate conditional samples with a target magnetic density of 0.15
mattergen-generate $RESULTS_PATH --pretrained-name=$MODEL_NAME --batch_size=16 --properties_to_condition_on="{'dft_mag_density': 0.15}" --diffusion_guidance_factor=2.0

[!TIP] The argument --diffusion-guidance-factor corresponds to the $\gamma$ parameter in classifier-free diffusion guidance. Setting it to zero corresponds to unconditional generation, and increasing it further tends to produce samples which adhere more to the input property values, though at the expense of diversity and realism of samples.

Multiple property-conditioned generation

You can also generate materials conditioned on more than one property. For instance, you can use the pre-trained model located at checkpoints/chemical_system_energy_above_hull to generate conditioned on chemical system and energy above the hull, or the model at checkpoints/dft_mag_density_hhi_score for joint conditioning on HHI score and magnetic density. Adapt the following command to your specific needs:

export MODEL_NAME=chemical_system_energy_above_hull
export RESULTS_PATH="results/$MODEL_NAME/"  # Samples will be written to this directory, e.g., `results/dft_mag_density`
mattergen-generate $RESULTS_PATH --pretrained-name=$MODEL_NAME --batch_size=16 --properties_to_condition_on="{'energy_above_hull': 0.05, 'chemical_system': 'Li-O'}" --diffusion_guidance_factor=2.0

Evaluation

Once you have generated a list of structures contained in $RESULTS_PATH (either using MatterGen or another method), you can relax the structures using the default MatterSim machine learning force field (see repository) and compute novelty, uniqueness, stability (using energy estimated by MatterSim), and other metrics via the following command:

git lfs pull -I data-release/alex-mp/reference_MP2020correction.gz --exclude=""  # first download the MP2020 reference dataset from Git LFS
mattergen-evaluate --structures_path=$RESULTS_PATH --relax=True --structure_matcher='disordered' --save_as="$RESULTS_PATH/metrics.json"

If you want to use the reference dataset while applying the TRI2024 correction scheme (recommended), instead run the following:

git lfs pull -I data-release/alex-mp/reference_TRI2024correction.gz --exclude=""  # ownload the TRI2024 reference datasets
mattergen-evaluate --structures_path=$RESULTS_PATH --relax=True --structure_matcher='disordered' --save_as="$RESULTS_PATH/metrics.json" --reference_dataset_path="data-release/alex-mp/reference_TRI2024correction.gz"

This script will write metrics.json containing the metric results to $RESULTS_PATH and will print it to your console.

[!IMPORTANT] The evaluation script in this repository uses MatterSim, a machine-learning force field (MLFF) to relax structures and assess their stability via MatterSim's predicted energies. While this is orders of magnitude faster than evaluation via density functional theory (DFT), it doesn't require a license to run the evaluation, and typically has a high accuracy, there are important caveats. (1) In the MatterGen publication we use DFT to evaluate structures generated by all models and baselines; (2) DFT is more accurate and reliable, particularly in less common chemical systems. Thus, evaluation results obtained with this evaluation code may give different results than DFT evaluation; and we recommend to confirm results obtained with MLFFs with DFT before drawing conclusions.

[!TIP] By default, this uses MatterSim-v1-1M. If you would like to use the larger MatterSim-v1-5M model, you can add the --potential_load_path="MatterSim-v1.0.0-5M.pth" argument. You may also check the MatterSim repository for the latest version of the model.

If, instead, you have relaxed the structures and obtained the relaxed total energies via another mean (e.g., DFT), you can evaluate the metrics via:

git lfs pull -I data-release/alex-mp/reference_MP2020correction.gz --exclude=""  # first download the reference dataset from Git LFS
mattergen-evaluate --structures_path=$RESULTS_PATH --energies_path='energies.npy' --relax=False --structure_matcher='disordered' --save_as='metrics'

This script will try to read structures from disk in the following precedence order:

If $RESULTS_PATH points to a .xyz or .extxyz file, it will read it directly and assume each frame is a different structure.
If $RESULTS_PATH points to a .zip file containing .cif files, it will firs

Mattergen

Install / Use

README

Table of Contents