Mattergen
Official implementation of MatterGen -- a generative model for inorganic materials design across the periodic table that can be fine-tuned to steer the generation towards a wide range of property constraints.
Install / Use
/learn @microsoft/MattergenREADME
MatterGen is a generative model for inorganic materials design across the periodic table that can be fine-tuned to steer the generation towards a wide range of property constraints.
Table of Contents
- Installation
- Get started with a pre-trained model
- Generating materials
- Evaluation
- Train MatterGen yourself
- Data release
- Citation
- Trademarks
- Responsible AI Transparency Documentation
- Get in touch
Installation
The easiest way to install prerequisites is via uv, a fast Python package and project manager.
The MatterGen environment can be installed via the following command (assumes you are running Linux and have a CUDA GPU):
pip install uv
uv venv .venv --python 3.10
source .venv/bin/activate
uv pip install -e .
Note that our datasets and model checkpoints are provided inside this repo via Git Large File Storage (LFS). To find out whether LFS is installed on your machine, run
git lfs --version
If this prints some version like git-lfs/3.0.2 (GitHub; linux amd64; go 1.18.1), you can skip the following step.
Install Git LFS
If Git LFS was not installed before you cloned this repo, you can install it via:
sudo apt install git-lfs
git lfs install
Apple Silicon
[!WARNING] Running MatterGen on Apple Silicon is experimental. Use at your own risk.
Further, you need to runexport PYTORCH_ENABLE_MPS_FALLBACK=1before any training or generation run.
Get started with a pre-trained model
We provide checkpoints of an unconditional base version of MatterGen as well as fine-tuned models for these properties:
mattergen_base: unconditional base model trained on Alex-MP-20mp_20_base: unconditional base model trained on MP-20chemical_system: fine-tuned model conditioned on chemical systemspace_group: fine-tuned model conditioned on space groupdft_mag_density: fine-tuned model conditioned on magnetic density from DFTdft_band_gap: fine-tuned model conditioned on band gap from DFTml_bulk_modulus: fine-tuned model conditioned on bulk modulus from ML predictordft_mag_density_hhi_score: fine-tuned model jointly conditioned on magnetic density from DFT and HHI scorechemical_system_energy_above_hull: fine-tuned model jointly conditioned on chemical system and energy above hull from DFT
The checkpoints are located at checkpoints/<model_name> and are also available on Hugging Face. By default, they are downloaded from Huggingface when requested. You can also manually download them from Git LFS via
git lfs pull -I checkpoints/<model_name> --exclude=""
[!NOTE] The checkpoints provided were re-trained using this repository, i.e., are not identical to the ones used in the paper. Hence, results may slightly deviate from those in the publication.
Generating materials
Unconditional generation
To sample from the pre-trained base model, run the following command.
export MODEL_NAME=mattergen_base
export RESULTS_PATH=results/ # Samples will be written to this directory
# generate batch_size * num_batches samples
mattergen-generate $RESULTS_PATH --pretrained-name=$MODEL_NAME --batch_size=16 --num_batches 1
This script will write the following files into $RESULTS_PATH:
generated_crystals_cif.zip: a ZIP file containing a single.ciffile per generated structure.generated_crystals.extxyz, a single file containing the individual generated structures as frames.- If
--record-trajectories == True(default):generated_trajectories.zip: a ZIP file containing a.extxyzfile per generated structure, which contains the full denoising trajectory for each individual structure.
[!TIP] For best efficiency, increase the batch size to the largest your GPU can sustain without running out of memory.
[!NOTE] To sample from a model you've trained yourself, replace
--pretrained-name=$MODEL_NAMEwith--model_path=$MODEL_PATH, filling in your model's location for$MODEL_PATH.
Property-conditioned generation
With a fine-tuned model, you can generate materials conditioned on a target property. For example, to sample from the model trained on magnetic density, you can run the following command.
export MODEL_NAME=dft_mag_density
export RESULTS_PATH="results/$MODEL_NAME/" # Samples will be written to this directory, e.g., `results/dft_mag_density`
# Generate conditional samples with a target magnetic density of 0.15
mattergen-generate $RESULTS_PATH --pretrained-name=$MODEL_NAME --batch_size=16 --properties_to_condition_on="{'dft_mag_density': 0.15}" --diffusion_guidance_factor=2.0
[!TIP] The argument
--diffusion-guidance-factorcorresponds to the $\gamma$ parameter in classifier-free diffusion guidance. Setting it to zero corresponds to unconditional generation, and increasing it further tends to produce samples which adhere more to the input property values, though at the expense of diversity and realism of samples.
Multiple property-conditioned generation
You can also generate materials conditioned on more than one property. For instance, you can use the pre-trained model located at checkpoints/chemical_system_energy_above_hull to generate conditioned on chemical system and energy above the hull, or the model at checkpoints/dft_mag_density_hhi_score for joint conditioning on HHI score and magnetic density.
Adapt the following command to your specific needs:
export MODEL_NAME=chemical_system_energy_above_hull
export RESULTS_PATH="results/$MODEL_NAME/" # Samples will be written to this directory, e.g., `results/dft_mag_density`
mattergen-generate $RESULTS_PATH --pretrained-name=$MODEL_NAME --batch_size=16 --properties_to_condition_on="{'energy_above_hull': 0.05, 'chemical_system': 'Li-O'}" --diffusion_guidance_factor=2.0
Evaluation
Once you have generated a list of structures contained in $RESULTS_PATH (either using MatterGen or another method), you can relax the structures using the default MatterSim machine learning force field (see repository) and compute novelty, uniqueness, stability (using energy estimated by MatterSim), and other metrics via the following command:
git lfs pull -I data-release/alex-mp/reference_MP2020correction.gz --exclude="" # first download the MP2020 reference dataset from Git LFS
mattergen-evaluate --structures_path=$RESULTS_PATH --relax=True --structure_matcher='disordered' --save_as="$RESULTS_PATH/metrics.json"
If you want to use the reference dataset while applying the TRI2024 correction scheme (recommended), instead run the following:
git lfs pull -I data-release/alex-mp/reference_TRI2024correction.gz --exclude="" # ownload the TRI2024 reference datasets
mattergen-evaluate --structures_path=$RESULTS_PATH --relax=True --structure_matcher='disordered' --save_as="$RESULTS_PATH/metrics.json" --reference_dataset_path="data-release/alex-mp/reference_TRI2024correction.gz"
This script will write metrics.json containing the metric results to $RESULTS_PATH and will print it to your console.
[!IMPORTANT] The evaluation script in this repository uses MatterSim, a machine-learning force field (MLFF) to relax structures and assess their stability via MatterSim's predicted energies. While this is orders of magnitude faster than evaluation via density functional theory (DFT), it doesn't require a license to run the evaluation, and typically has a high accuracy, there are important caveats. (1) In the MatterGen publication we use DFT to evaluate structures generated by all models and baselines; (2) DFT is more accurate and reliable, particularly in less common chemical systems. Thus, evaluation results obtained with this evaluation code may give different results than DFT evaluation; and we recommend to confirm results obtained with MLFFs with DFT before drawing conclusions.
[!TIP] By default, this uses
MatterSim-v1-1M. If you would like to use the largerMatterSim-v1-5Mmodel, you can add the--potential_load_path="MatterSim-v1.0.0-5M.pth"argument. You may also check the MatterSim repository for the latest version of the model.
If, instead, you have relaxed the structures and obtained the relaxed total energies via another mean (e.g., DFT), you can evaluate the metrics via:
git lfs pull -I data-release/alex-mp/reference_MP2020correction.gz --exclude="" # first download the reference dataset from Git LFS
mattergen-evaluate --structures_path=$RESULTS_PATH --energies_path='energies.npy' --relax=False --structure_matcher='disordered' --save_as='metrics'
This script will try to read structures from disk in the following precedence order:
- If
$RESULTS_PATHpoints to a.xyzor.extxyzfile, it will read it directly and assume each frame is a different structure. - If
$RESULTS_PATHpoints to a.zipfile containing.ciffiles, it will firs
