Tartarus
A Benchmarking Platform for Realistic And Practical Inverse Molecular Design
Install / Use
/learn @aspuru-guzik-group/TartarusREADME
Tartarus: Practical and Realistic Benchmarks for Inverse Molecular Design
This repository contains the code and results for the paper Tartarus, an open-source collection of benchmarks for evaluation of a generative model.
Benchmarking with Tartarus
Benchmarking with Docker
To run the Tartarus benchmark we recommend using the provided Docker container. Optionally, we also provide instructions to building and running the benchmark locally. The following directions will walk you through setup and evaluation. You will need to have Docker installed on your machine. Once you have Docker installed, you can follow these steps:
-
Write the SMILES to be evaluated to a CSV file with a column header
smiles. -
Pull the latest Tartarus Docker image:
docker pull johnwilles/tartarus:latest
- Run the Docker container with the directory of your data mounted, the benchmark mode and the CSV input filename:
docker run --rm -it -v ${LOCAL_PATH_TO_DATA}:/data johnwilles/tartarus:latest --mode ${BENCHMARK_MODE} --input_filename ${INPUT_FILENAME}
- The output file will be written to the same directory by default with the filename
output.csv.
Installing from Source
To install Tartarus locally, we recommend using the provided Conda environment definition.
- Clone the Tartarus repository.
git clone git@github.com:aspuru-guzik-group/Tartarus.git
- Create a Conda environment.
conda env create -f environment.yml
- Activate the tartarus Conda environment.
conda activate tartarus
- Ensure that docking task executables have the correct permissions.
chmod 777 tartarus/data/qvina
chmod 777 tartarus/data/smina
Note: These executables are only compatible with Linux.
Documentation
Detailed documentation can be found here: Tartarus Docs
Getting started
Below are some examples of how to load the datasets and use the fitness functions. For more details, you can also look at example.py.
Datasets
All datasets are found in the datasets directory. The arrows indicate the goal (↑ = maximization, ↓ = minimization).
|Task | Dataset name | # of smiles | Columns in file ||||||
|---|--------------------|------------------|----|----|----|---|----|----|
| Designing OPV | hce.csv | 24,953 | PCE<sub>PCBM</sub> -SAS (↑) | PCE<sub>PCDTBT</sub> -SAS (↑) |
| Designing emitters | gdb13.csv | 403,947 | Singlet-triplet gap (↓) | Oscillator strength (↑) | Multi-objective (↑) | | ||
| Designing drugs | docking.csv | 152,296 | 1SYH (↓) | 6Y2F (↓) | 4LDE (↓) | | | |
| Designing chemical reaction substrates | reactivity.csv | 60,828 | Activation energy ΔE<sup>‡</sup> (↓) | Reaction energy ΔE<sub>r</sub> (↓) | ΔE<sup>‡</sup> + ΔE<sub>r</sub> (↓) | - ΔE<sup>‡</sup> + ΔE<sub>r</sub> (↓) | | |
Designing organic photovoltaics
To use the evaluation function, load either the full xtb calculation from the pce module, or use the surrogate model, with pretrained weights.
import pandas as pd
data = pd.read_csv('./datasets/hce.csv') # or ./dataset/unbiased_hce.csv
smiles = data['smiles'].tolist()
smi = smiles[0]
## use full xtb calculation in hce module
from tartarus import pce
dipm, gap, lumo, combined, pce_pcbm_sas, pce_pcdtbt_sas = pce.get_properties(smi)
## use pretrained surrogate model
dipm, gap, lumo, combined = pce.get_surrogate_properties(smi)
Designing Organic Emitters
Load the objective functions from the tadf module. All 3 fitness functions are returned for each smiles.
import pandas as pd
data = pd.read_csv('./datasets/gdb13.csv')
smiles = data['smiles'].tolist()
smi = smiles[0]
## use full xtb calculation in hce module
from tartarus import tadf
st, osc, combined = tadf.get_properties(smi)
Design of drug molecule
Load the docking module. There are separate functions for each of the proteins, as shown below.
import pandas as pd
data = pd.read_csv('./datasets/docking.csv')
smiles = data['smiles'].tolist()
smi = smiles[0]
## Design of Protein Ligands
from tartarus import docking
score_1syh = docking.get_1syh_score(smi)
score_6y2f = docking.get_6y2f_score(smi)
score_4lde = docking.get_4lde_score(smi)
Design of Chemical Reaction Substrates
Load the reactivity module. All 4 fitness functions are returned for each smiles.
import pandas as pd
data = pd.read_csv('./datasets/reactivity.csv')
smiles = data['smiles'].tolist()
smi = smiles[0]
## calculating binding affinity for each protein
from tartarus import reactivity
Ea, Er, sum_Ea_Er, diff_Ea_Er = reactivity.get_properties(smi)
Results
Our results for running the corresponding benchmarks can be found here:
- Design of Protein Ligands: https://drive.google.com/file/d/1d_4mg1Eb7HrUJ2L7A8kFtld-TmPmOKlJ/view?usp=sharing
- Design of Chemical Reaction Substrates: https://drive.google.com/file/d/1fCnFxSUITg4qSlOuwFolvQPUQA31Qaii/view?usp=sharing
- Designing organic photovoltaics (photovoltaic conversion efficiency): https://drive.google.com/file/d/1w6oOBGjDC4Enh492jLQ7A3Xc1XbHXiIt/view?usp=sharing
- Designing Organic Emitters: https://drive.google.com/file/d/1l8weYg835HDGvOoRbOcHUnvLjiyQi_Ms/view?usp=sharing
- Designing organic photovoltaics (Explore): https://drive.google.com/file/d/1-J99iXfBx0_aG1BqEEXPh7q0kovBFD0L/view?usp=sharing
- Designing organic photovoltaics (Surrogate, exploit): https://drive.google.com/file/d/1EV7ST9_F4DBnQpxhd6VaaJWP5r9ygr0c/view?usp=sharing
- Designing organic photovoltaics (Exploit): https://drive.google.com/file/d/1Yh_8E3jRf6X230CvlRlPtk2qPQIkC5hB/view?usp=sharing
Questions, problems?
Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat98[AT]stanford[DOT]edu, robert[DOT]pollice[AT]gmail[DOT]com)
License
Related Skills
diffs
341.2kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.9kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
