EvoCov

<img align="right" src="evocovlogo.png" width=170px> This is EvoCov, a pipeline designed for analysis of SARS-CoV-2 sequences from GISAID. The pipeline can be run interactively or by default, with a view to using SARS-CoV-2 sequence information to make an evolutionarily aware estimate of efficient epitopes on the spike protein for antibody design. While the code is open source, the data intended for use with the pipeline may only be obtained with express permission from GISAID. If you have any issues making use of the pipeline, or suggestions for how it could be improved, please open an issue or start a discussion!

Installation

Clone the github repository to your machine to use the EvoCov package. Before using, you should check that any python dependencies are installed.

git clone https://github.com/ciarajudge/EvoCov.git

pip install -r requirements.txt

Preparation of site-wise mutation rates using Treecov

The prediction aspect of the pipeline makes use of estimated site-wise mutation rates from analysis of a SARS-CoV-2 phylogenetic tree with baseml, a Phylogenetic Analysis by Maximum Likelihood program. To generate these rates, you must download and compile paml and place it in the ./treecov/ directory, where the path to the baseml executable is ./treecov/paml/bin/baseml. It is important that the folders are named correctly. You must also download a phylogenetic tree on GISAID by clicking Audacity on the platform, and place the file global.tree in the ./treecov/ directory. To run the treecov pipeline to generate the rates, navigate to the treecov directory and use the command:

python treepipe.py /absolute/path/to/GISAID/fasta/file

This initiates the process of iterative sampling and analysis of the phylogenetic tree 100 times, in 10 batches of 10. These batch sizes, or the number of batches, can be adjusted by changing the number of loops in the code in subsampletree.R (for batch size) and treepipe.py (for no. of batches).

Default Usage of Evocov

Navigate to the cloned repository and call the package along with the file paths of your latest GISAID unmasked sequence file and metadata file. This will initiate a default run of the pipeline, including handling of any exceptions or options. This includes the final step of the pipeline where the results are piped to a PDF using R.

python -m evocov /path/to/sequencefile_masked.fa /path/to/metadata.tsv

If you'd like to be notified by text when the pipeline is complete, pass a third argument with a valid mobile number (no plus signs or brackets) for example: 353877910680 where the country code is +353 and the phone number is 0877910680.

python -m evocov /path/to/sequencefile_masked.fa /path/to/metadata.tsv 353877910680

Interactive Usage

Navigate to the cloned repository and call the package using the below command.

python -m evocov

Running the pipeline in this manner will create an interactive session where you will be able to select file names for the output, and give the names of the variants you want included in the analysis. Following epitope scoring you will also be given the option to use R to generate an output PDF with the key findings of the pipeline.

Pipeline Structure

Below is a flowchart outlining the rough pipeline structure.

Things to note

If the text function isn't working anymore, contact me at judge.ciara@gmail.com or here on GitHub. The text message is sent using a subscription type service and I would just need to buy a bit more of an allowance.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

EvoCov

Install / Use

README

EvoCov

Installation

Preparation of site-wise mutation rates using Treecov

Default Usage of Evocov

Interactive Usage

Pipeline Structure

Things to note

Contributing

License