PyMolSAR
A Python toolkit to compute molecular features and predict activities and properties of small molecules
Install / Use
/learn @BeckResearchLab/PyMolSARREADME
PyMolSAR
PyMolSAR aims to provide a generalizable open-source tool for calculating 759 molecular descriptors and test out several different supervised learning algorithms to build the most-appropriate Quantitative Structure-Activity Relationship (QSAR) classification or regression model that accurately predicts the chemical properties or activities of small molecules.
Table of contents:
Requirements
Installation
Using a conda environment
git clone https://github.com/BeckResearchLab/small-molecule-design-toolkit.git
cd small-molecule-design-toolkit
python setup.py install
Getting Started
Two good tutorials to get started are Melting Point Prediction and Blood-Brain Barrier Permeability. Follow along with the tutorials to see how to predict properties on molecules using machine learning.
Input Formats
- A column containing SMILES strings.
- A column containing an experimental measurement.
Data Featurization
Most machine learning algorithms require that input data form vectors.
However, input data for cheminformatics and drug discovery datasets routinely come in the format of lists of molecules and associated experimental readouts. To transform lists of molecules into vectors,
we need to calculate a set of molecular descriptors using smdt.molecular_descriptors.getAllDescriptors()
Models
smdt can build and evaluate different classification and regression models built on top of sklearn.
A model report is generated to facilitate the user to choose the most appropriate Quantitative Structure-Activity Relationship (QSAR) or
Quantitative Structure-Property Relationship (QSPR) model.
