Emap2sec+

Emap2sec+ is a computational tool using deep learning that can accurately identify structures, alpha helices, beta sheets, other(coils/turns) and DNA/RNA, in cryo-Electron Microscopy (EM) maps of medium to low resolution.

License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)

Contact: Daisuke Kihara (dkihara@purdue.edu)

Citation:

Xiao Wang, Eman Alnabati, Tunde W Aderinwale, Sai Raghavendra Maddhuri Venkata Subramaniya, Genki Terashi & Daisuke Kihara. Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nature Commununications 12, 2302 (2021). https://doi.org/10.1038/s41467-021-22577-3 PDF

@article{wang2021emap2secplus,   
  title={Detecting Protein and DNA/RNA Structures in Cryo-EM Maps of Intermediate Resolution Using Deep Learning},   
  author={Xiao Wang, Eman Alnabati, Tunde W Aderinwale, Sai Raghavendra Maddhuri Venkata Subramaniya, Genki Terashi, and Daisuke Kihara},    
  journal={Nature Communications},    
  year={2021}    
}

Project website: http://kiharalab.org/emsuites/emap2secplus.php

Online Platform:

1 Server: https://em.kiharalab.org/algorithm/emap2sec+

2 Colab: https://bit.ly/emap2secplus or https://github.com/kiharalab/Emap2secPlus/blob/master/Emap2sec%2B.ipynb

3 CodeOcean: https://doi.org/10.24433/CO.7165707.v1

Simulated Map Dataset:

Introduction

An increasing number of density maps of macromolecular structures, including proteins and protein and DNA/RNA complexes, have been determined by cryo-electron microscopy (cryo-EM). Although lately maps at a near-atomic resolution are routinely reported, there are still substantial fractions of maps determined at intermediate or low resolutions, where extracting structure information is not trivial. Here, we report a new computational method, Emap2sec+, which identifies DNA or RNA as well as the secondary structures of proteins in cryo-EM maps of 5 to 10 Å resolution. Emap2sec+ employs the deep Residual convolutional neural network. Emap2sec+ assigns structural labels with associated probabilities at each voxel in a cryo-EM map, which will help structure modeling in an EM map. Emap2sec+ showed stable and high assignment accuracy for nucleotides in low resolution maps and improved performance for protein secondary structure assignments than its earlier version when tested on simulated and experimental maps.

Overall Protocol

(1) Preprocess cryo-EM map (*.mrc format;including remove density outside the contour level and change the grid size to 1);
(2) Scan EM map to get voxel input and corrsponding locations and save it in *.trimmap file;
(3) Generate *.input file which includes formatted 3D input for Network;
(4) Apply Phase1 Network and Phase2 Network to assign labels for each voxel and save the predictions in *pred.txt;
(5) Output *.pdb and *.pml file to visualize predictions;
(6) Output the evaluation report in *report.txt (if with PDB structure).

Overall Network Framework

Network Framework consists of 4 steps:

(1) Apply binary-class model and multi-class model to obtain predicted probabilities for each voxel;
(2) Concatenate probability values from different models to have 8 probability values for each voxel;
(3) Apply Phase 2 network to utilize the neighboring predicted probabilities from phase 1 to further classify each voxel;
(4) Output the final predictions for each voxel.

Phase 1 Network Architecture

Phase 2 Network Architecture

Pre-required software

Python 3 : https://www.python.org/downloads/
pdb2vol (for generating simulated maps): https://situs.biomachina.org/fguide.html
Pymol(for visualization): https://pymol.org/2/

Installation

1. `Install git`

2. Clone the repository in your computer

git clone git@github.com:kiharalab/Emap2secPlus.git && cd Emap2secPlus

3. Build dependencies and install with anaconda

3.1 `install conda`.

3.2 Install dependency in command line

conda create -n Emap python=3.8
conda activate Emap
conda install gcc=14.1
pip install -r requirements.txt
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Each time when you want to run my code, simply activate the environment by

conda activate Emap
conda deactivate(If you want to exit)

Note that CUDA 11 is needed for the software. Make sure your GPU supports CUDA 11.

4. Downloading the model files and example files.

Due to the data quota limit of github, our model can't be kept in this repo. Please download them here and put them in the Emap2secPlus directory. Two different types of model are included here. best_model.tar.gz includes all trained models based on author recommended contour level. nocontour_best_model.tar.gz includes all trained models without the author contour level.

All the trained models can also be downloaded via zenodo

Data Avilability

The raw data of the structure models built by our method are provided in Supplementary Information, Supp. Table 1 and 4. The simulated EM maps can be downloaded in https://doi.org/10.5281/zenodo.4599657). The experimental EM maps can be downloaded from EMDB (https://www.emdataresource.org/). The data that support the findings of this study are available from the corresponding author upon request.

Usage

python3 main.py -h:
  -h, --help            show this help message and exit
  -F F                  map path
  --mode MODE           0: Detect structures for EM Map 
                        1: Detect and evaluate structures for EM map with pdb structure
                        2: Detect structure for experimental maps with 4 fold models
                        3: Detect and evaluate structure for experimental maps with 4 fold models
                        4: Detect protein and DNA/RNA with 4 fold models
  -P P                  native structure path (PDB format) for evaluating model's performance (usually not available for real scenarios)
  -M                    Trained model path which saved all the trained models
  --type TYPE           0:simulated map at 6 Å 1: simulated map at 10 Å 2:simulated map at 6-10 Å 3:experimental map
  --gpu GPU             gpu id choose for training
  --class CLASS         number of classes
  --batch_size BATCH_SIZE batch size for training
  --contour CONTOUR     Contour level for real map
  --fold FOLD           specify the fold model used for detecting the experimental map
  --output_folder       specify a custom folder where results will be stored (optional, default will be located in project root and name will depend on mode)
  --no_compilation      using this optional argument will skip automatic compilation before running the project

1. Detect structures with EM maps

python3 main.py --mode=0 -F=[Map_path] --type=[Map_Type] --gpu=0 --class=4 --contour=[contour_level] --fold=[Choose_Fold]

Here [Map_path] is the cryo-EM map file path in your computer. [Map_Type] should be specified based on your input map type, which will be used to load proper pre-trained model. [contour_level] and [Choose_Fold] only need to be specified for experimental maps.
Output will be saved in "Predict_Result/[Map_Type]/[Input_Map_Name]".

2. Evaluate Performance (only when the correct underlined structure in the map is known)

In the case that you are testing the software with a case, you can check the accuracy of the structure detection by Emap2sec+ by comparing the result with the known structure. This mode cannot be used in real scenarios where the native structure is not available. We usually use the mode to evaluate Emap2sec+ performance on testing dataset with known structures to verify its performance. This mode is also useful to measure the difference of the detected structure by Emap2sec+ with the structure currently assigned to the EM map.

python3 main.py --mode=1 -F=[Map_path] -P=[PDB_path] --type=[Map_Type] --gpu=0 --class=4 --contour=[contour_level] --fold=[Choose_Fold]

Here [PDB_path] is the PDB file path for known structure. All other parameters should follow the same rule in --mode=0.
Output will be saved in "Predict_Result_WithPDB/[Map_Type]/[Input_Map_Name]".

3. Detect structure for experimental maps with 4 fold networks

python3

Emap2secPlus

Install / Use

README

Emap2sec+

Citation:

Project website: http://kiharalab.org/emsuites/emap2secplus.php

Online Platform:

1 Server: https://em.kiharalab.org/algorithm/emap2sec+

2 Colab: https://bit.ly/emap2secplus or https://github.com/kiharalab/Emap2secPlus/blob/master/Emap2sec%2B.ipynb

3 CodeOcean: https://doi.org/10.24433/CO.7165707.v1

Simulated Map Dataset:

Introduction

Overall Protocol

Overall Network Framework

Network Framework consists of 4 steps:

Phase 1 Network Architecture

Phase 2 Network Architecture

Pre-required software

Installation

1. `Install git`

2. Clone the repository in your computer

3. Build dependencies and install with anaconda

3.1 `install conda`.

3.2 Install dependency in command line

4. Downloading the model files and example files.

Data Avilability

Usage

1. Detect structures with EM maps

2. Evaluate Performance (only when the correct underlined structure in the map is known)

3. Detect structure for experimental maps with 4 fold networks

Emap2secPlus

Install / Use

README

Emap2sec+

Citation:

Project website: http://kiharalab.org/emsuites/emap2secplus.php

Online Platform:

1 Server: https://em.kiharalab.org/algorithm/emap2sec+

2 Colab: https://bit.ly/emap2secplus or https://github.com/kiharalab/Emap2secPlus/blob/master/Emap2sec%2B.ipynb

3 CodeOcean: https://doi.org/10.24433/CO.7165707.v1

Simulated Map Dataset:

Introduction

Overall Protocol

Overall Network Framework

Network Framework consists of 4 steps:

Phase 1 Network Architecture

Phase 2 Network Architecture

Pre-required software

Installation

1. Install git

2. Clone the repository in your computer

3. Build dependencies and install with anaconda

3.1 install conda.

3.2 Install dependency in command line

4. Downloading the model files and example files.

Data Avilability

Usage

1. Detect structures with EM maps

2. Evaluate Performance (only when the correct underlined structure in the map is known)

3. Detect structure for experimental maps with 4 fold networks

1. `Install git`

3.1 `install conda`.