SkillAgentSearch skills...

GAIA

GAIA automates the generation of reactive MLIP datasets for atomistic simulations.

Install / Use

/learn @samsungDS-PoCs/GAIA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <img src="./gaia_logo.png" alt="gaia_logo" width="250"/> </div>

GAIA

GAIA is a framework to generate datasets with an automated pipeline for machine learning interatomic potentials.

Prerequisites

Quantum mechanics (QM) package

A QM package is required to run GAIA. <br> Currently it is designed to use VASP, but will support more packages later.

Distributed environment with shared storage

GAIA has been implemented under the assumption of distributed environment. <br> Also, shared storage is required for each node to access the same directory with an identical path.

Job scheduler

GAIA is currently designed to use SLURM as the job scheduler, <br> but with minor code modifications, one can easily adapt it to other schedulers or execute it on a single node.

Dependencies

We provide requirements.txt that allows users to fully reproduce the environment used for the GAIA implementation. <br> GAIA also requires the following binaries: CREST, nebmake.pl, Open Babel, xTB, xTB-IFF

Usage

Config file

  • user_config provides an example YAML file with user-defined settings for the data-generator, data-improver, and GAIA-Bench.
  • base_config serves as a skeleton configuration. It includes default values for advanced parameters, while user-defined parameters override those in the base config.

Data-generator

Input preparation

  • Chemical components <br> GAIA supports both periodic (e.g., metals) and non-periodic (e.g., molecules with organic species) components. <br> Each should follow the format of .POSCAR and .xyz, respectively.

Run

$ cd GAIA
$ python main.py -a data_generator -c {user_config (.yaml)} -o {out_dir} -p {prefix}
  • If out_dir is /home/GAIA_out and prefix is first, artifacts and the log is saved in /home/GAIA_out/first/

Data-improver

Input preparation

  • Trainset, validset and model checkpoint <br> Data improver provides recommendations based on error metrics on validset, as well as trainset itself, <br> which requires a valid dataset (.extxyz) and a trained model checkpoint (e.g. .pt or .pth), in addition to a train dataset. <br> The MLIP framework with calculator for the checkpoint should be also set up.

Run

$ cd GAIA
$ python main.py -a data_improver -c {user_config (.yaml)} -o {out_dir} -p {prefix}

GAIA-Bench

Input preparation

  • GAIA-Bench datasets and model checkpoint <br> GAIA-Bench includes four benchmark tasks, of which the datasets are available at GAIA-Bench <br> A model checkpoint to test is required; the MLIP framework with calculator for the checkpoint should be also set up.

Run

$ cd GAIA
$ python main.py -a benchmark -c {user_config (.yaml)} -o {out_dir} -p {prefix}

Dataset and model checkpoint

Titan25 is an MLIP dataset constructed with GAIA, comprising 1.8M data points across 11 elements. SNet-T25 is an MLIP trained on this dataset. See GAIA paper for details.

Citation

If using this code, please cite our work as follows:

@article{gaia2025,
  title={Scalable Reactive Atomistic Dynamics with GAIA},
  author={Song, Suhwan and Kim, Heejae and Jang, Jaehee and Cho, Hyuntae and Kim, Gunhee and Kim, Geonu},
  journal={arXiv preprint arXiv:2509.25798},
  year={2025}
}
View on GitHub
GitHub Stars28
CategoryDevelopment
Updated1mo ago
Forks0

Languages

Python

Security Score

80/100

Audited on Feb 13, 2026

No findings