SynthGPT
Code and Data for "Large Language Models for Inorganic Synthesis Prediction"
Install / Use
/learn @jschrier/SynthGPTREADME
SynthGPT
This repository contains the data and code for Large Language Models for Inorganic Synthesis Predictions by Seongmin Kim, Yousung Jung, and Joshua Schrier.

Organization
Input data and pre-defined training and cross-validation and train/test splits are found in the data_MP and data folders, for the synthesizability and precursor selection tasks, respectively.
Results are in the results_MP and results folders, for the synthesizability and precursor selection tasks, respectively. We have used a JSON format to facilitate interpretation of the results.
Prompts for the LLM are in the prompts folder as plain text files; they can also be found in the online Supporting Information file.
Source code is in the src folder; some haphazard tests are included in tests.
Instructions
Run the notebooks in the top-level directory in order. Mathematica code (.wls) uses Mathematica 14.0 and no other libraries. Python code (.py) uses python 3.8.13 and requires libraries; Numpy (version == 1.22.3), PyTorch (version == 1.11.0), and Pymatgen (version == 2022.9.21).
The directory is organized around the order in which we performed the work, dividing the work into discrete tasks:
- Precursor selection (scripts
00_Data_Curation.py-07_Estimate_Perfect_Elemwise.py) - Synthesizability prediction (
08_Data_Preparation_Synthesizability.wls-11_Score_GPT_Outputs_Synthesizability.wls) - Evaluation of precursor rescoring results with GPT-4 (
12a_SetupData_Combined.wlsand12b_Evaluate_Combined.wls) and by removing recommendations that do not consist of only allowed precursors (13_Precursor_Compliance.wlsand14_Evaluate_Combination_Retaining_Only_Allowed_Precursors.wls) - Evaluation of the effects of prompt modification on the synthesizability prediction. These are each evaluated for only the first 5000 test items. They include modifying the prompt to add additional specialization ("You are an expert oxide inorganic chemist...",
15a_Prompt_Modification_Oxide.wls), removing specialization ("You are a magician..."15b_Prompt_Modification_Magician.wls), and alternate ways of expressing the positive-unlabelled training task ("...items labeled "U" could be positive or negative (i.e., synthesizable or unsynthesizable"),15c_Prompt_Modification_Labeling.wls).
Yes, this is different from the order the paper. "Life can only be understood backwards; but it must be lived forwards." --Søren Kierkegaard
Cite
A publication appears on the Journal of the American Chemical Society as doi:10.1021/jacs.4c05840
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
openclaw-plugin-loom
Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
