Mutree
A pipeline for phylogenetic tree inference and mutation recurrence discovery
Install / Use
/learn @baezortega/MutreeREADME
Download the latest release: mutree 2.7182 ( zip | tar.gz ).
mutree
A pipeline for phylogenetic tree inference and mutation recurrence discovery
Adrian Baez-Ortega
Transmissible Cancer Group, University of Cambridge
mutree is a generalization and extension of Asif Tamuri's treesub pipeline. It makes use of RAxML [1] and parts of treesub itself (which in turns uses the Java libraries PAL [2] and BioJava [3]) in order to infer a phylogenetic tree and identify candidate recurrent coding-affecting mutations in it, from a coding DNA sequence alignment.
The pipeline generates:
-
A maximum likelihood phylogenetic tree including bootstrap values in its branches (Newick format).
-
A version of the ML tree showing all the annotated mutations in the branches where they occur (Nexus format).
-
A version of the ML tree showing only the recurrent mutations in the branches where they occur (Nexus format). A nonsynonymous mutation in a branch of the tree is considered to be recurrent if another nonsynonymous mutation in the same gene has been found in a different branch.
-
A text table with all the single-nucleotide substitutions found in the alignments, indicating whether they are nonsynonymous and recurrent.
mutree has been tested on an Ubuntu 14.04.4 system, and it should behave well in any Linux distribution. It should also work well on Mac OS X.
Installation
mutree depends on the installation of the following software:
-
RAxML version 8.2.9 or later. mutree requires compiling the
raxmlHPC-SSE3andraxmlHPC-PTHREADS-SSE3RAxML executables, which should work well in processors up to 5 years old. -
A recent Java runtime (1.6+), which might be already installed in your system.
-
Although it is not required in order to run the pipeline, some visualisation tool is needed to open the output tree files. FigTree can read the Nexus format in which the substitution trees are output. The tree showing the bootstrap support values (in Newick format) can be opened using e.g. Dendroscope, or converted to a different format.
mutree already includes its own (slightly customized) version of the treesub pipeline, named 'treesub-TCG'. Therefore, installing treesub is not necessary, although in some cases it may have to be re-compiled (see NOTE below).
The following instructions describe the steps for installing mutree and all its components in an Ubuntu 14.04.4 system; they should be valid for any Ubuntu or Debian Linux distribution. The tools employed have available Mac and Windows versions (please consult their respective websites). mutree itself has not been tested on Mac or Windows systems, but it might work with an appropriate Bash shell.
-
Install RAxML
You only need to install RAxML if the commands
which raxmlHPC-PTHREADS-SSE3orwhich raxmlHPC-SSE3do not print anything in the terminal.Go to the desired installation folder (in this example, the Software folder inside your home directory, or
~/Software):cd ~/SoftwareDownload and compile RAxML:
wget https://github.com/stamatak/standard-RAxML/archive/v8.2.9.tar.gz tar zxvf v8.2.9.tar.gz rm v8.2.9.tar.gz cd standard-RAxML-8.2.9/ make -f Makefile.SSE3.gcc rm *.o make -f Makefile.SSE3.PTHREADS.gcc rm *.oThen, edit your
~/.bashrcfile using:nano ~/.bashrcand append the
standard-RAxML-8.2.9directory at the end of your PATH variable. If the PATH variable is not defined, you can define it by adding the following line at the end of the~/.bashrcfile:export PATH=~/Software/standard-RAxML-8.2.9:$PATHThen save and close the file (Ctrl-X).
-
Install the Java Runtime Environment
You only need to install Java if the command
which javadoes not print anything in the terminal.sudo apt-get install default-jreThe system will ask for your password; you need to have administrator permissions in your system in order to use
sudo apt-get install. -
Install mutree
Go to the desired installation folder, and download and uncompress mutree (replace
2.xxwith the latest version):cd ~/Software wget https://github.com/adrianbaezortega/mutree/archive/v2.xx.tar.gz tar zxvf v2.xx.tar.gz rm v2.xx.tar.gzThen, edit your
~/.bashrcfile using:nano ~/.bashrcand append the
mutree-2.xx/srcdirectory at the end of your PATH variable. If the PATH variable was not defined, not its line should look like:export PATH=~/Software/standard-RAxML-8.2.9:~/Software/mutree-2.xx/src:$PATHThen save and close the file (Ctrl-X).
Either close the terminal and open a new one, or source the
~/.bashrcfile in order to apply the changes:source ~/.bashrcThen you should be able to run the following commands, which should print something like this:
which raxmlHPC-PTHREADS-SSE3 # prints: [...]/standard-RAxML-8.2.9/raxmlHPC-PTHREADS-SSE3 which raxmlHPC-SSE3 # prints: [...]/standard-RAxML-8.2.9/which raxmlHPC-SSE3 which java # prints: /usr/bin/java which mutree # prints: [...]/mutree-2.xx/src/mutree
And now you can have fun!
NOTE: If you encounter problems while using mutree and they seem to be related to the treesub pipeline, you can try re-compiling it. You need to go to the treesub-TCG folder within the mutree installation directory, and re-compile treesub using Ant:
cd ~/Software/mutree-2.xx/treesub-TCG
export ANT_OPTS="-Xmx256m"
ant compile jar
Running mutree
The pipeline requires the following input:
-
Absolute path to a coding sequence (CDS) alignment file, in FASTA format (
-ioption). Each sequence in the file should be composed of a concatenation of multiple gene CDS sequences, all of which must be in frame (i.e. the concatenated sequence must contain codon bases only, and its length must be a multiple of 3). If the length of a CDS is not a multiple of 3, any trailing bases after the last codon have to be removed before adding the CDS to the concatenated sequence. Each sequence in the FASTA file represents a sample (taxon), and must be labeled with a unique sample name. Sample names cannot include any blank spaces, tabulators, carriage returns, colons, commas, parentheses or square brackets. Each sequence must be on a single line, so that odd lines in the file contain the sample names, while even lines contain the sequences. The first sequence in the file will be used as an outgroup to root the tree, so this should be the reference sequence or a suitable outgroup sample. An example can be found in the file mutree-2.xx/examples/Alignment_H3HASO.fna (this has been adapted from one of treesub's example files). -
Absolute path to a "gene table" (
-goption). This is mandatory unless the-foption is used. The gene table must be a tab-delimited file with no header and two columns: gene symbol and CDS start position (position of the first nucleotide in the concatenated sequence). This allows mapping each mutation to the gene where it occurs and finding recurrent mutations. An example can be found in the file mutree-2.xx/examples/GeneTable_H3HASO.txt (the gene symbols and positions have been defined arbitrarily for this example). -
Absolute path to an output directory (
-ooption). The directory will be created if necessary. The pipeline implements a checkpoint logging system, so in the event that the execution is interrupted before finishing, re-running mutree with the same output directory will resume the execution after the last successfully completed step.
mutree also accepts other optional input:
-
Number of RAxML threads (
-toption). This allows using the multi-threaded version of RAxML to substantially speed up the tree inference and the ancestral sequence reconstruction. This value can be any positive integer, and cannot be higher than the available number of processors. The default value is 1. -
Custom RAxML options for tree inference (
-roption). This allows personalizing the RAxML routine, which uses rapid bootstrapping followed by maximum likelihood search by default (see pipeline description below). Custom options must be specified as a single string within quotes, and must include all the required options for running RAxML, except for the options-s,-n,-wand-T, which cannot be used. -
Custom RAxML options for ancestral sequence reconstruction (
-aoption). This allows personalizing the ASR settings, which consist of a GTR substitution model plus a Gamma model of rate heterogeneity by default (see pipeline description below). Custom options must be specified as a single string within quotes, and must include all the required options for running RAxML, except for the options-f,-s,-n,-wand-T, which cannot be used. -
Perform tree inference and rooting only (
-foption). If this option is specified, only the first three steps of the pipeline will be run. Thus, in this case, it is not necessary to provide a gene table via-g, and there is also no need for the input alignment (-i) to be composed of coding sequences (unless the rest of the pipeline is to be run aft
