DeNS
[TMLR 2024 J2C Certification] Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields
Install / Use
/learn @atomicarchitects/DeNSREADME
Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields
This repository contains the official PyTorch implementation of the work "Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields" (TMLR 2024). We show that force encoding enables generalizing denoising to non-equilibrium structures and propose to use DeNS (Denoising Non-Equilibrium Structures) as an auxiliary task to improve the performance on energy and force predictions.
We provide the code for training EquiformerV2 with DeNS on OC20 and OC22 datasets here and training Equiformer with DeNS on MD17 in this repository.
<p align="center"> <img src="fig/denoising_structures_overview.png" alt="photo not available" width="98%" height="98%"> </p> <p align="center"> <img src="fig/dens_training_process.png" alt="photo not available" width="98%" height="98%"> </p> <p align="center"> <img src="fig/dens_oc20_all+md.png" alt="photo not available" width="98%" height="98%"> </p> <p align="center"> <img src="fig/dens_oc20_leaderboard.png" alt="photo not available" width="98%" height="98%"> </p> <p align="center"> <img src="fig/dens_oc22.png" alt="photo not available" width="98%" height="98%"> </p> <p align="center"> <img src="fig/dens_md17.png" alt="photo not available" width="98%" height="98%"> </p>As demonstrated in OMat24 paper, EquiformerV2 + DeNS achieves state-of-the-art results on Matbench Discovery leaderboard as of October 18, 2024.
<p align="center"> <img src="fig/equiformer_v2_dens_matbench_discovery.png" alt="photo not available" width="98%" height="98%"> </p>Content
Environment Setup
Environment
See here for setting up the environment.
OC20
Please first set up the environment and file structures (placing this repository under ocp and rename it to experimental) following the above Environment section.
The OC20 S2EF dataset can be downloaded by following instructions in their GitHub repository.
For example, we can download the OC20 S2EF-2M dataset by running:
cd ocp
python scripts/download_data.py --task s2ef --split "2M" --num-workers 8 --ref-energy
We also need to download the "val_id" data split to run training.
After downloading, the datasets should be under ocp/data.
To train on different splits like All and All+MD, we can follow the same link above to download the datasets.
OC22
Please first set up the environment and file structures (placing this repository under ocp and rename it to experimental) following the above Environment section.
Similar to OC20, the OC22 dataset can be downloaded by following instructions in their GitHub repository.
MD17
Please refer to this repository for training Equiformer with DeNS on MD17.
File Structure
configscontains config files for training with DeNS on different datasets.datasetscontains LMDB dataset class that can distinguish whether structures in OC20 come from All split or MD split.modelcontains EquiformerV2 and eSCN models capable of training with DeNS.scriptscontains the scripts for launching training based on config files.trainerscontains the code for training models for S2EF and with DeNS.
Training
OC20
-
Modify the paths to datasets before launching training. For example, we need to modify the path to the training set as here and the validation set as here before training EquiformerV2 with DeNS on OC20 S2EF-2M dataset for 12 epochs.
-
We train EquiformerV2 with DeNS on the OC20 S2EF-2M dataset for 12 epochs by running:
cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@multi-nodes.shNote that following the above Environment section, we will run the script under
ocp. This script will use 2 nodes with 8 GPUs on each node.We can also run training on 8 GPUs on 1 node:
cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@8.shNote that this is to show that we can train on a single node and the results are not the same as training on 16 GPUs.
Similarly, we train EquiformerV2 with DeNS on the OC20 S2EF-2M dataset for 30 epochs by running:
cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@30_splits@2M_g@multi-nodes.shThis script will use 4 nodes with 8 GPUs on each node.
-
We train EquiformerV2 with DeNS on the OC20 S2EF-All+MD dataset by running:
cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@20_L@6_M@3_splits@all-md_g@multi-nodes.shThis script will use 16 nodes with 8 GPUs on each node.
We use a slightly different dataset class
DeNSLmdbDatasetso that we can differentiate whether a structure is from the All split or the MD split. This corresponds to the code here and requiresrelaxationsandmdto exist indata_log.*.txtfiles under the All+MD data directory. Thosedata_log.*.txtshould look like:# for All split /.../relaxations/.../random1331004.traj,258,365 ...After reading the lmdb files, the
DeNSLmdbDatasetdataset will add a new attributemdas here.
OC22
-
Modify the paths to datasets before launching training. Specifically, we need to modify the path to the training set as here and the validation set as here.
In addition, we need to download the linear reference file from here and then add the path to the linear reference file as here and here.
Finally, we download the OC20 reference information file from here and add the path to that file as here and here.
-
We train EquiformerV2 with DeNS on OC22 dataset by running:
cd ocp/ sh experimental/scripts/train/oc22/s2ef/equiformer_v2/equiformer_dens_v2_N@18_L@6_M@2_epochs@6_g@multi-nodes.shThis script will use 4 nodes with 8 GPUs on each node.
MD17
Please refer to this repository for training Equiformer with DeNS on MD17.
Checkpoint
We provide the checkpoints of EquiformerV2 trained with DeNS on OC20 S2EF-2M dataset for 12 and 30 epochs, OC20 S2EF-All+MD dataset, and OC22 dataset. |Split |Epochs |Download |val force MAE (meV / Å) |val energy MAE (meV) | |--- |--- |--- |--- |--- | | OC20 S2EF-2M | 12 |checkpoint | config | 19.09 | 269 | | OC20 S2EF-2M | 30 |checkpoint | config | 18.02 | 251 | | OC20 S2EF-All+MD | 2 | checkpoint | config | 14.0 | 222 | | OC22 | 6 | checkpoint | config | (ID) 20.66 | (OOD) 27.11 | (ID) 391.6 | (OOD) 533.0 |
Evaluation
We provide the evaluation script on OC20 and OC22 datasets. After following the above Environment section and
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
