SkillAgentSearch skills...

PySIPFENN

Python python toolset for Structure-Informed Property and Feature Engineering with Neural Networks. It offers unique advantages through (1) effortless extensibility, (2) optimizations for ordered, dilute, and random atomic configurations, and (3) automated model tuning.

Install / Use

/learn @PhasesResearchLab/PySIPFENN

README

pySIPFENN

GitHub top language PyPI - Python Version License: LGPL v3 PyPI - Version PyPI - Downloads

Core Linux (Ubuntu) Core Mac M1 Core Windows Full Test codecov

stable latest Static Badge

2022 Paper: DOI Arxiv

2024 Paper: DOI Arxiv

ML Models: DOI

Summary

This repository contains python toolset for Structure-Informed Property and Feature Engineering with Neural Networks which implements a numer of user-friendly tools for:

  • Calculating different vector representations of atomic structures for a number of applications including supervised (e.g., predictive machine learning models) and unsupervised learning (e.g., clustering of atomic structures based on similarity or performing anomaly detection). Notably, utilize crystallographic information and some other techniques to make this process very efficient for the vast majority of use cases (see 10.1016/j.commatsci.2024.113495)
  • Efficient deployment of pre-trained ML models (not limited to neural networks) obtained from repositories like Zenodo (including some we trained) or trained locally on user's machine. The system is very plug-and-play thanks to using Open Neural Network Exchange (ONNX) format which can be exported from nearly any machine learning framework.
  • Tuning pre-trained ML models to new domains, like new chemical compositions, different ab initio functional, or entirely new properties. Since V0.16, users can take advantage of integration with OPTIMADE API which allows one to tune models based on DFT datasets like Materials Project, OQMD, AFLOW, or NIST-JARVIS, in just 3 lines of code specifying which provider to use, what to query for, and hyperparameters for tuning.

The underlying methodology, efficiency optimizations, design choices, and implementation specifics are given in the following publications:

  • Adam M. Krajewski, Jonathan W. Siegel, Zi-Kui Liu, Efficient Structure-Informed Featurization and Property Prediction of Ordered, Dilute, and Random Atomic Structures, Computational Materials Science, Volume 247, 2025, 113495, DOI: 10.1016/j.commatsci.2024.113495

  • Adam M. Krajewski, Jonathan W. Siegel, Jinchao Xu, Zi-Kui Liu, Extensible Structure-Informed Prediction of Formation Energy with improved accuracy and usability employing neural networks, Computational Materials Science, Volume 208, 2022, 111254, DOI:10.1016/j.commatsci.2022.111254

A more complete (and verbose) description of capabilities is given in documentation at (pysipfenn.org). You may also consider visiting our Phases Research Lab group website at (phaseslab.org).

Recent News:

  • (v0.16.0) Three exciting news! (1) The all new ModelAdjusters submodule automates tuning and can fetch data directly from OPTIMADE API; (2) A new manuscript detailing advantages of our featurization tools has been put on arXiv:2404.02849; and (3) the name of the software was updated to python toolset for Structure-Informed Property and Feature Engineering with Neural Networks to retain the pySIPFENN acronym but better reflect our strengths and development direction.

  • (v0.15.0) A new descriptor (feature vector) calculator KS2022_randomSolutions has been implemented. It is used for structure-informed featurization of compositions randomly occupying a lattice, spiritually similar to SQS generation, but also taking into account (1) chemical differences between elements and (2) structural effects.

  • (v0.14.0) Users can now take advantage of a Prototype Library to obtain common structures from any Calculator instance with c.prototypeLibrary[<name>]['structure']. It can be easily updated or appended with high-level API or by manually modifyig its YAML here.

  • (v0.13.0) Model exports (and more!) to PyTorch, CoreML, and ONNX are now effortless thanks to core.modelExporters module. Please note you need to install pySIPFENN with dev option (e.g., pip install "pysipfenn[dev]") to use it. See docs here.

  • (v0.12.2) Swith to LGPLv3 allowing for integration with proprietary software developed by CALPHAD community, while supporting the development of new pySIPFENN features for all.

  • (March 2023 Workshop) We would like to thank all 100 of our amazing attendees for making our workshop, co-organized with the Materials Genome Foundation.

Main Schematic

The figure below is the main schematic of pySIPFENN framework detailing the interplay of internal components. The user interface provides a high-level API to process structural data within core.Calculator, pass it to featurization submodules in descriptorDefinitions to obtain vector representation, then passed to models defined in models.json and (typically) run automatically through all available models. All internal data of core.Calculator is accessible directly, enabling rapid customization. An auxiliary high-level API enables advanced users to operate and retrain the models.

<img src="https://raw.githubusercontent.com/PhasesResearchLab/pySIPFENN/main/docs/_static/pySIPFENN_MainSchematic.png" alt="Main Schematic Figure" width="800" style="display: block; margin-left: auto; margin-right: auto;"/>

Applications

pySIPFENN is a very flexible tool that can, in principle, be used for the prediction of any property of interest that depends on an atomic configuration with very few modifications. The models shipped by default are trained to predict formation energy because that is what our research group is interested in; however, if one wanted to predict Poisson’s ratio and trained a model based on the same features, adding it would take minutes. Simply add the model in open ONNX format and link it using the models.json file, as described in the documentation.

Real-World Examples

In our line of work, pySIPFENN and the formation energies it predicts are usually used as a computational engine that generates proto-data for creation of thermodynamic databases (TDBs) using ESPEI (https://espei.org). The TDBs are then used through pycalphad (https://pycalphad.org) to predict phase diagrams and other thermodynamic properties.

Another of its uses in our research is guiding the Density Functional Theory (DFT) calculations as a low-cost screening tool. Their efficient conjunction then drives the experiments leading to discovery of new materials, as presented in these two papers:

Related Skills

View on GitHub
GitHub Stars24
CategoryDevelopment
Updated2mo ago
Forks5

Languages

Python

Security Score

80/100

Audited on Jan 20, 2026

No findings