CSpred
UCBShift is a program for predicting chemical shifts for backbone atoms and β-carbon of a protein in solution. It utilizes a machine learning module that makes predictions from features extracted from the 3D structures of the proteins.
Install / Use
/learn @THGLab/CSpredREADME
UCBShift
UCBShift is a program for predicting chemical shifts for backbone atoms and β-carbon of a protein in solution. The program implements two mechanisms: a transfer prediction module that employs both sequence alignment and structure alignment to select references for shift replication; and an ensemble decision tree based machine learning module which takes features extracted from a PDB file and makes trustful chemical shift predictions. When combined together, this new predictor achieves state-of-the-art accuracy for predicting chemical shifts in a "real-world" dataset, with root-mean-square errors of 0.38, 0.22, 1.31, 0.97, 1.29 and 2.16 ppm between prediction and experimental values for H, Hα, C, Cα, Cβ and N.
Publication
Li, J., Bennett, K. C., Liu, Y., Martin, M. V., & Head-Gordon, T. (2020). Accurate prediction of chemical shifts for aqueous protein structure on “Real World” data. Chemical Science, 11(12), 3180-3191. DOI: 10.1039/C9SC06561J
Using UCBShift through NMRBox
We recommend users run UCBShift through NMRBox, which provides out-of-box using experience for UCBShift in their virtual machines. You can sign up for NMRBox here: https://nmrbox.nmrhub.org/
Software package requirements
Python and python packages
- Python (>=3.5)
- Numpy
- Pandas
- scikit learn (0.22, https://scikit-learn.org/stable/)
- Biopython (1.74, https://biopython.org/)
- Joblib (https://joblib.readthedocs.io/en/latest/)
- matplotlib
External programs needed
- blast (2.9.0, https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
- mTM-align (20180725, http://yanglab.nankai.edu.cn/mTM-align/)
- DSSP (2.04, https://swift.cmbi.umcn.nl/gv/dssp/)
Installation notes
We suggest creating new virtual environments (i.e. anaconda) for the code. You can install the required packages using the provided requirements.txt file
However you still need to download and install the external programs manually.
Usage
Because the trained models are big, users are directed to here to download all the saved model files. After downloading the models.tgz file, extract them into the models/ folder using the command tar -xzf models.tgz (so that there will be 18 .sav files under models/ folder)<br>
Users can use the trained model "as is" once they have correctly configured the python packages and external programs.
The CSpred.py file is the entrance to UCBShift chemical shift predictor. <br>
The easiest, out-of-the-box way of using UCBShift is running CSpred.py script directly on your desired protein. A [shifts.csv] file will be generated at the same position where you executed the script. The syntax will be something like this:
python CSpred.py your_input_protein.pdb
Advanced options are described below:
<pre> usage: CSpred.py [-h] [--batch] [--output OUTPUT] [--worker WORKER] [--shifty_only] [--shiftx_only] [--pH PH] [--test] input positional arguments: input The query PDB file or list of PDB files for which the shifts are calculated optional arguments: --batch, -b If toggled, input accepts a text file specifying all the PDB files need to be calculated (Each line is a PDB file name. If pH values are specified, followed with a space) --output OUTPUT, -o OUTPUT Filename of generated output file. A file [shifts.csv] is generated by default. If in batch mode, you should specify the path for storing all the output files. Each output file has the same name as the input PDB file name. --worker WORKER, -w WORKER Number of CPU cores to use for parallel prediction in batch mode. --shifty_only, -y, -Y Only use UCBShift-Y (transfer prediction) module. Equivalent to executing UCBShift-Y directly with default settings --shiftx_only, -x, -X Only use UCBShift-X (machine learning) module. No alignment results will be utilized or calculated --pH PH, -pH PH, -ph PH pH value to be considered. Default is 5 --test, -t If toggled, use test mode for UCBShift-Y prediction </pre>If you want to execute UCBShift-Y with more options, you can run ucbshifty.py.
Outputs
A shifts.csv file will be generated by default under the folder where you run the CSpred.py script. For each amino acid in the protein, a line is reported in the generated .csv file that contains the residue number (RESNUM), residue name (RESNAME), UCBShift-X predictions ([atom]_X), UCBShift-Y predictions ([atom]_Y), UCBShift final predictions ([atom]_UCBShift), the TM-score for the best matched sequence from UCBShift-Y of that residue ([atom]_BEST_REF_SCORE), amino acid coverage for the best matched sequence from UCBShift-Y ([atom]_BEST_COV), and whether the residue is matched for the best-aligned protein ([atom]_BEST_REF_MATCH).
Reproducibility
You can reproduce the results by preparing all the data and retrain the model on your own machine. Follow PROCEDURE.md under the folder train_model/ for a complete description of how to train the model.
FAQs
Q: I have run into the following issue:
......
File "/home/shreygupta/soft/ucbshift/ucbshifty.py", line 362, in Needleman_Wunsch_alignment
aligner.substitution_matrix=blosum62
File "/home/shreygupta/.local/lib/python3.6/site-packages/Bio/Align/__init__.py", line 1509, in __setattr__
_aligners.PairwiseAligner.__setattr__(self, key, value)
ValueError: expected a matrix
How can I solve the problem?
A: This is a common issue caused by a higher version of the Biopython package. Please downgrade the biopython to version 1.74 if you run into this error. Alternatively, we suggest using UCBShift through NMRBox. Please see section above.
License
Copyright ©20xx The Regents of the University of California (Regents). All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for educational, research, and not-for-profit purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following paragraphs appear in all copies, modifications, and distributions. Contact The Office of Technology Licensing, UC Berkeley, 2150 Shattuck Avenue, Suite 408, Berkeley, CA 94704-1362, otl@berkeley.edu.
IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
