FreeSolv
Experimental and calculated small molecule hydration free energies
Install / Use
/learn @MobleyLab/FreeSolvREADME
FreeSolv: Experimental and Calculated Small Molecule Hydration Free Energies
This repository provides an issue tracker and revision control for the FreeSolv database, initially described in JCAMD (10): http://dx.doi.org/10.1007/s10822-014-9747-x. If you find any issues, please raise an issue in the issue tracker or file a pull request!
Releases are automatically assigned unique DOIs via Zenodo. Latest release:
Abstract:
This work provides a curated database of experimental and calculated hydration free energies for small molecules in water, along with experimental values and input files. Experimental values are taken from prior literature and will continue to be curated, with updated experimental references and data added as it becomes available. Calculated values are based on the GAFF small molecule force field in TIP3P water with AM1-BCC charges, as in the provided parameter files. Values were calculated using the GROMACS simulation package, with full details given in references cited within the database itself. This database builds on previous work from the Mobley lab and others, and extends the prior database. With deposition in eScholarship, the database is now versioned, allowing citation of specific versions of the database, and easier updates.
Background:
This page provides an update of David Mobley's hydration free energy database. The current goal is to provide curated calculated and experimental values for every molecule which the Mobley group has studied at any point, and to allow these to be updated in a versioned manner as issues are found, better experimental data is tracked down or obtained, and so on.
The prior database gives calculated and experimental values for a 504 molecule set which has been called the "504 molecule set", or "the Mobley set" or similar variants. The explicit solvent study on this set was published in (1) and the implicit solvent version in (2), and the full database is in the supporting information. The "504 molecule set" built on earlier sets, notably that from Rizzo (3) and earlier hydration studies by David Mobley and collaborators.
The current set and format is motivated by several factors:
- There were several problems with specific molecules and/or experimental values in the 504 molecule set which needed correcting
- We have studied many additional molecules since then and these need adding to the set
- We need a way to continue sharing and expanding our set, providing both experimental data with references and calculated values (with parameters) as these are used as inputs to test other methods
- We want to be able to update the set in a versioned manner without having to write a new paper for every update, which necessitates migrating away from journal supporting information.
What we provide:
The database consists of a .tar.gz file containing:
database.txt: A semicolon delimited text file containing compound IDs, SMILES, IUPAC names or similar, experimental values and uncertainties, calculated values, DOIs for references, and notes. Format described in the headerdatabase.pickle: Python pickle file containing the same database, with some extra fields as well including 'groups', which provides functional groups for the compounds as assigned by checkmol), PubChem compound IDs, calculated enthalpies of hydration, some experimental enthalpies of hydration (from ORCHYD), and components of the enthalpy of hydration and hydration free energy (as described in our forthcoming paper, to be linked here soon).groups.txt: Functional groups for compounds as assigned by checkmol. Semicolon delimited. First field is compound ID, second field is compound name, and subsequent fields are functional groups.iupac_to_cid.pickle, smiles_to_cid.pickle: Python pickle files containing conversion of IUPAC name to compound id and SMILES string to compound id, stored in dictionaries- Structure files:
mol2files_sybyl.tar.gz:mol2files with partial charges as written by OEChem in Sybyl format/Sybyl atom typesmol2files_gaff.tar.gz:mol2files with partial charges as used for our hydration free energy calculations (AMBER GAFF atom types)sdffiles.tar.gz:sdffiles with partial charges as written by OEChemgromacs_original.tar.gz: GROMACS format topology and coordinate files as used for our AM1-BCC GAFF hydration free energy calculations. Technical note: There may be some variation as to whether water molecules are or are not included in these files; these are intended to be used for the small molecule parameters only.
(See the Manifest below for a more complete list of all available files.)
The future:
The database is maintained on the cite-able eScholarship repository of the University of California. It is currently available on that site at www.escholarship.org/uc/item/6sd403pz. Updated versions will be maintained there, mirroring point releases provided via this GitHub site.
Please cite:
Mobley, David L. (2013). Experimental and Calculated Small Molecule Hydration Free Energies. UC Irvine: Department of Pharmaceutical Sciences, UCI. Retrieved from: http://www.escholarship.org/uc/item/6sd403pz
Manifest
gromacs_analysis: Contains plots resulting from GROMACS analysis of some of the data in FreeSolv.gromacs_energies: Contains XVG files associated with the most recent (2017) update of FreeSolv calculated values; these files are large and are only available in the archived version of the database and not on GitHub.gromacs_mdpfiles: Contains GROMACS run (.mdp) files used for the calculations connected with the most recent (2017) update of the calculated hydration free energies and enthalpies reported here.mol2files_gaff.tar.gz: contains mol2 files for all compounds with AM1-BCC charges and GAFF atom typesmol2files_sybyl.tar.gz: contains mol2 files for all compounds with AM1-BCC charges and SYBYL atom typesprimary-data: Primary data from which the contents of this database can be re-generated; obtained from full database viascripts/extract-primary-data.pyscripts: Scripts pertaining to the material deposited heresdffiles.tar.gz: SDF-format files for all of the molecules deposited here (as inmol2files_gaffandmol2files_sybyl)amber.tar.gz: AMBER format parameter, coordinate, and frcmod files corresponding to the systems we ultimately simulated in GROMACS.gromacs_original.tar.gz: GROMACS format topology and coordinate files for the calculations associated with the computed values in FreeSolv, for calculations in gas phase. These were generated from AMBER files via acpype, prior to our more recent migration to ParmEd.gromacs_solvated.tar.gz: GROMACS format topology and coordinate files for the calculations associated with the computed values in FreeSolv, for calculations in solution, again generated from AMBER files via acpype.lammps.tar.gz: LAMMPS format topology and coordinate files for the calculations associated with the computed values in FreeSolv, automatically converted using InterMol from the AMBER filescharmm.tar.gz: CHARMM format topology and coordinate files for the calculations associated with the computed values in FreeSolv, automatically converted using ParmEd (via InterMol) from the AMBER filesgromacs.tar.gz: GROMACS format topology and coordinate files for the calculations associated with the computed values in FreeSolv, automatically converted using ParmEd (via InterMol) from the AMBER filesdesmond.tar.gz: DESMOND format topology and coordinate files for the calculations associated with the computed values in FreeSolv, automatically converted using InterMol from the AMBER filessimulation_comparison_input/: directory containing input files used for the validation of the input conversion files by comparing energy files, description of automated conversion process, and the energy comparisons. Seesimulation_comparison_input/README.mdfor more details.README.md: This filedatabase.pickle: Python pickle file of the FreeSolv databasedatabase.json: JSON format version of the FreeSolv database also stored indatabase.pickledatabase.txt: Text format version of some of the fields from the databasegroups.txt: Functional groups assigned to the different compounds in the databaseiupac_to_cid.pickleand.json: Python pickle file and JSON file containing a dictionary for converting IUPAC names to FreeSolv compound IDssmiles_to_cid.pickleand.json: Python pickle and JSON file containing a dictionary for converting SMILES strings to FreeSolv compound IDsnotebooks/OrionDB.ipynb: iPython notebook providing an example of concatenating molecules and associating generic data.
Rebuilding FreeSolv
The input files deposited here can be rebuilt (from SMILES strings) using the script scripts/rebuild_freesolv.py, which requires the Chodera lab's openmoltools package and the Mobley Lab's SolvationToolkit, both of which are conda installable from the omnia channel.
Change log/version history:
This dataset started by taking all of the compounds we have studied previously with hydration free energies (references 1, 2, 4-9) including those from SAMPL4 and compiling them all into one big set, removing any redundancies and providing data, references, etc. for all of them. Details of changes for specific versions are found below.
On 12/20/2013 this database was moved to the eScholarship site of the University of California, at http://www.escholarship.org/uc/item/6sd403pz.
Version 0.1:
- We corrected the following problems from the 504 molecule set (1-2):
- Removal of 504/triacetyl glycerol, which was not the intended molecule (and the intended molecule, glycerol triacetate, is present in v0.1 anyway as it comes in via reference (5)
- Correction of the experimental value for hexafluoroprope
