PyMolSAR

PyMolSAR aims to provide a generalizable open-source tool for calculating 759 molecular descriptors and test out several different supervised learning algorithms to build the most-appropriate Quantitative Structure-Activity Relationship (QSAR) classification or regression model that accurately predicts the chemical properties or activities of small molecules.

Table of contents:

Requirements

Installation

Using a conda environment

git clone https://github.com/BeckResearchLab/small-molecule-design-toolkit.git
cd small-molecule-design-toolkit
python setup.py install

Getting Started

Two good tutorials to get started are Melting Point Prediction and Blood-Brain Barrier Permeability. Follow along with the tutorials to see how to predict properties on molecules using machine learning.

Input Formats

A column containing SMILES strings.
A column containing an experimental measurement.

Data Featurization

Most machine learning algorithms require that input data form vectors. However, input data for cheminformatics and drug discovery datasets routinely come in the format of lists of molecules and associated experimental readouts. To transform lists of molecules into vectors, we need to calculate a set of molecular descriptors using smdt.molecular_descriptors.getAllDescriptors()

Models

smdt can build and evaluate different classification and regression models built on top of sklearn. A model report is generated to facilitate the user to choose the most appropriate Quantitative Structure-Activity Relationship (QSAR) or Quantitative Structure-Property Relationship (QSPR) model.

PyMolSAR

Install / Use

README

PyMolSAR

Requirements

Installation

Getting Started

Input Formats

Data Featurization

Models