CryoREAD
CryoREAD: a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map.
Install / Use
/learn @kiharalab/CryoREADREADME
CryoREAD
<a href="https://github.com/marktext/marktext/releases/latest"> <img src="https://img.shields.io/badge/CryoREAD-v1.0.0-green"> <img src="https://img.shields.io/badge/platform-Linux%20%7C%20Mac%20-green"> <img src="https://img.shields.io/badge/Language-python3-green"> <img src="https://img.shields.io/badge/dependencies-tested-green"> <img src="https://img.shields.io/badge/licence-GNU-green"> </a>Cryo_READ is a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map.
Copyright (C) 2022 Xiao Wang, Genki Terashi, Daisuke Kihara, and Purdue University.
License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)
Contact: Daisuke Kihara (dkihara@purdue.edu)
For technical problems or questions, please reach to Xiao Wang (wang3702@purdue.edu).
Citation:
Xiao Wang, Genki Terashi & Daisuke Kihara. De novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nature Methods, 2023. https://www.nature.com/articles/s41592-023-02032-5
@article{wang2023CryoREAD,
title={De novo structure modeling for nucleic acids in cryo-EM maps using deep learning},
author={Xiao Wang, Genki Terashi, and Daisuke Kihara},
journal={Nature Methods},
year={2023}
}
Notice
The newer version of Intel MKL will cause pytorch to give the following error due to a symbol being removed: ImportError undefined symbol: iJIT_NotifyEvent is encountered.
We have updated the environment.yml and requirements.txt to fix the version to an older version. Any previous installation should work just fine.
If you have ever encountered this issue, please activate the conda env first and run conda install mkl==2024.0.
News
Apr 2024: CryoREAD includes a new model to support DNA/RNA structure modeling for input maps from 5-10A. The model is trained with maps at resolution 5-10A, and it will be used in CryoREAD once the input resolution is 5-10A.
Online Platform:
Server(Recommended): https://em.kiharalab.org/algorithm/CryoREAD
<details> We have three publicly available platforms, which basically offer the same functionality. Input: cryo-EM map+sequence file (optional). Output: modeled structure. The input and output are the same across all platforms. </details>Google Colab: https://bit.ly/CryoREAD
<details>Step-by-step instructions are available. Limited by redistribution constraints of Coot and Phenix, the structure here is not refined and may include atom clashes. If you want better structure, please use our server or Github. For free user, colab has 4-hour running time limit and may not work for large structure(>=1000 nucleotides).
</details>Local installation with source code at Github
<details> Full code is available here and it is easier for user to modify to develop their own tools. <br>It provides two additional supports: <br>1. Detection Output: This option outputs probability values of detected phosphate, sugar, base, and base types, computed by deep learning, in the map, for users reference. <br>2. Refinement pipeline: structures from other source can be refined in the specified EM map. </details>Project website: https://kiharalab.org/emsuites
Detailed pipeline instructions can be found https://kiharalab.org/emsuites/cryoread.php
CryoREAD algorithm video (20 minutes): https://www.youtube.com/watch?v=p7Bpou2vL6o
For benchmark purpose, please check the eval_code for predicted structure evaluation.
Introduction
<details> <summary>Cryo_READ is a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map. </summary> DNA and RNA play fundamental roles in various cellular processes, where the three-dimensional (3D) structure provides critical information to understand molecular mechanisms of their functions. Although an increasing number of structures of nucleic acids and their complexes with proteins are determined by cryogenic electron microscopy (cryo-EM), structure modeling for DNA and RNA is still often challenging particularly when the map is determined at sub-atomic resolution. Moreover, computational methods are sparse for nucleic acid structure modeling.Here, we developed a deep learning-based fully automated de novo DNA/RNA atomic structure modeling method, CryoREAD. CryoREAD identifies phosphate, sugar, and base positions in a cryo-EM map using deep learning, which are traced and modeled into a 3D structure. When tested on cryo-EM maps determined at 2.0 to 5.0 Å resolution, CryoREAD built substantially accurate models than existing methods. We have further applied the method on cryo-EM maps of biomolecular complexes in SARS-CoV-2.
</details>Overall Protocol
<details>- Structure Detection by deep neural network CryoREAD networks; <br>
- Tracing backbone according to detections; <br>
- Fragment-based nucleotide assignment; <br>
- Full atomic structure modeling. <br>
Installation
<details>System Requirements
CPU: >=8 cores <br> Memory (RAM): >=50Gb. For maps with more than 3,000 nucleotides, memory space should be higher than 200GB if the sequence is provided. <br> GPU: any GPU supports CUDA with at least 12GB memory. <br> GPU is required for CryoREAD and no CPU version is available for CryoREAD since it is too slow.
Pre-required software
Required
Python 3 : https://www.python.org/downloads/
Phenix: https://phenix-online.org/documentation/install-setup-run.html
Coot: https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/
Optional
Pymol (for map visualization): https://pymol.org/2/
Chimera (for map visualization): https://www.cgl.ucsf.edu/chimera/download.html
Installation
1. Install git
2. Clone the repository in your computer
git clone https://github.com/kiharalab/CryoREAD.git && cd CryoREAD
3. Build dependencies.
You have two options to install dependency on your computer:
3.2 Install with anaconda (Recommended)
3.2.1 install anaconda.
3.2.2 Install dependency in command line
Make sure you are in the CryoREAD directory and then run
conda env create -f environment.yml
Each time when you want to run this software, simply activate the environment by
conda activate CryoREAD
conda deactivate(If you want to exit)
3.2 Install with pip and python (Not Suggested).
3.2.1install pip.
3.2.2 Install dependency in command line.
pip3 install -r requirements.txt --user
If you encounter any errors, you can install each library one by one:
pip3 install biopython
pip3 install numpy
pip3 install numba
pip3 install scipy
pip3 install ortools
pip3 install mrcfile
pip3 install torch==1.6.0
4 Verify the pre-installed software
To verify phenix is correctly installed for final refinement step, please run
phenix.real_space_refine -h
To veryify coot is correctly installed for final refinement step, please run
coot
If it can print out the help information of this function, then the refinemnt step of our program can be supported. If not, please always remove --refine command line in all the commands, then CryoREAD should output structure without refinement.
</details>Usage
Command
<details> <summary>Command Parameters</summary>usage: main.py [-h] [-F F] [-M M] [-P P] --mode MODE [--contour CONTOUR] [--stride STRIDE] [--box_size BOX_SIZE] [--gpu GPU] [--batch_size BATCH_SIZE] [-f F] [-m M]
[-g G] [-k K] [-R R] [--rule_soft RULE_SOFT] [--frag_size FRAG_SIZE] [--frag_stride FRAG_STRIDE] [--top_select TOP_SELECT] [--resolution RESOLUTION]
[--num_workers NUM_WORKERS] [--prediction_only PREDICTION_ONLY] [--no_seqinfo NO_SEQINFO]
optional arguments:
-h, --help show this help message and exit
-F F Input map file path. (str)
-M M Pre-trained model path. (str) Default value: "best_model". If you want to reproduce the results in our paper, your can specify "best_model_paper". Here the default path is the new model trained on the entire dataset.
-P P Optional fasta sequence file path. (str)
--mode MODE Control Mode for program: 0: cryo_READ structure modeling. Required parameter. (Integer), Default value: 0
--contour CONTOUR Contour level for input map, suggested 0.5*[author_contour]. (Float), Default value: 0.0
--stride STRIDE Stride for scanning of deep learning model. (Integer), Default value: 16.
--box_size BOX_SIZE Input box size for deep learning model. (Integer), Default value: 64
--gpu GPU Specify the gpu we will use. (str), Default value: None.
--batch_size BATCH_SIZE
Batch size for inference of network. (Integer), Default value: 4.
-f F Filter for representative points, for LDPs, removing points' normalized density<=-f (Float), Default value: 0.05
-m M After meanshifting merge points distance<[float]. (Float), Default value: 2.0.
-g G Bandwidth of the Gaussian filter, (Float), Default value: 3.0.
-k K Always keep edges where d<k parameter. (Float), Default value: 0.5
-R R Maximum length of local edges. (Float), Default value: 10.0.
--rule_soft RULE_SOFT
