SkillAgentSearch skills...

Mutree

A pipeline for phylogenetic tree inference and mutation recurrence discovery

Install / Use

/learn @baezortega/Mutree
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Download the latest release: mutree 2.7182 ( zip | tar.gz ).

DOI


mutree

A pipeline for phylogenetic tree inference and mutation recurrence discovery

Adrian Baez-Ortega
Transmissible Cancer Group, University of Cambridge

mutree is a generalization and extension of Asif Tamuri's treesub pipeline. It makes use of RAxML [1] and parts of treesub itself (which in turns uses the Java libraries PAL [2] and BioJava [3]) in order to infer a phylogenetic tree and identify candidate recurrent coding-affecting mutations in it, from a coding DNA sequence alignment.

The pipeline generates:

  • A maximum likelihood phylogenetic tree including bootstrap values in its branches (Newick format).

  • A version of the ML tree showing all the annotated mutations in the branches where they occur (Nexus format).

  • A version of the ML tree showing only the recurrent mutations in the branches where they occur (Nexus format). A nonsynonymous mutation in a branch of the tree is considered to be recurrent if another nonsynonymous mutation in the same gene has been found in a different branch.

  • A text table with all the single-nucleotide substitutions found in the alignments, indicating whether they are nonsynonymous and recurrent.

mutree has been tested on an Ubuntu 14.04.4 system, and it should behave well in any Linux distribution. It should also work well on Mac OS X.


Installation

mutree depends on the installation of the following software:

  • RAxML version 8.2.9 or later. mutree requires compiling the raxmlHPC-SSE3 and raxmlHPC-PTHREADS-SSE3 RAxML executables, which should work well in processors up to 5 years old.

  • A recent Java runtime (1.6+), which might be already installed in your system.

  • Although it is not required in order to run the pipeline, some visualisation tool is needed to open the output tree files. FigTree can read the Nexus format in which the substitution trees are output. The tree showing the bootstrap support values (in Newick format) can be opened using e.g. Dendroscope, or converted to a different format.

mutree already includes its own (slightly customized) version of the treesub pipeline, named 'treesub-TCG'. Therefore, installing treesub is not necessary, although in some cases it may have to be re-compiled (see NOTE below).

The following instructions describe the steps for installing mutree and all its components in an Ubuntu 14.04.4 system; they should be valid for any Ubuntu or Debian Linux distribution. The tools employed have available Mac and Windows versions (please consult their respective websites). mutree itself has not been tested on Mac or Windows systems, but it might work with an appropriate Bash shell.

  1. Install RAxML

    You only need to install RAxML if the commands which raxmlHPC-PTHREADS-SSE3 or which raxmlHPC-SSE3 do not print anything in the terminal.

    Go to the desired installation folder (in this example, the Software folder inside your home directory, or ~/Software):

    cd ~/Software
    

    Download and compile RAxML:

    wget https://github.com/stamatak/standard-RAxML/archive/v8.2.9.tar.gz
    tar zxvf v8.2.9.tar.gz
    rm v8.2.9.tar.gz
    
    cd standard-RAxML-8.2.9/
    make -f Makefile.SSE3.gcc
    rm *.o
    make -f Makefile.SSE3.PTHREADS.gcc
    rm *.o
    

    Then, edit your ~/.bashrc file using:

    nano ~/.bashrc
    

    and append the standard-RAxML-8.2.9 directory at the end of your PATH variable. If the PATH variable is not defined, you can define it by adding the following line at the end of the ~/.bashrc file:

    export PATH=~/Software/standard-RAxML-8.2.9:$PATH
    

    Then save and close the file (Ctrl-X).

  2. Install the Java Runtime Environment

    You only need to install Java if the command which java does not print anything in the terminal.

    sudo apt-get install default-jre
    

    The system will ask for your password; you need to have administrator permissions in your system in order to use sudo apt-get install.

  3. Install mutree

    Go to the desired installation folder, and download and uncompress mutree (replace 2.xx with the latest version):

    cd ~/Software
    
    wget https://github.com/adrianbaezortega/mutree/archive/v2.xx.tar.gz
    tar zxvf v2.xx.tar.gz
    rm v2.xx.tar.gz
    

    Then, edit your ~/.bashrc file using:

    nano ~/.bashrc
    

    and append the mutree-2.xx/src directory at the end of your PATH variable. If the PATH variable was not defined, not its line should look like:

    export PATH=~/Software/standard-RAxML-8.2.9:~/Software/mutree-2.xx/src:$PATH
    

    Then save and close the file (Ctrl-X).

    Either close the terminal and open a new one, or source the ~/.bashrc file in order to apply the changes:

    source ~/.bashrc
    

    Then you should be able to run the following commands, which should print something like this:

    which raxmlHPC-PTHREADS-SSE3  # prints: [...]/standard-RAxML-8.2.9/raxmlHPC-PTHREADS-SSE3
    which raxmlHPC-SSE3           # prints: [...]/standard-RAxML-8.2.9/which raxmlHPC-SSE3
    which java                    # prints: /usr/bin/java
    which mutree                 # prints: [...]/mutree-2.xx/src/mutree
    

And now you can have fun!

NOTE: If you encounter problems while using mutree and they seem to be related to the treesub pipeline, you can try re-compiling it. You need to go to the treesub-TCG folder within the mutree installation directory, and re-compile treesub using Ant:

cd ~/Software/mutree-2.xx/treesub-TCG
export ANT_OPTS="-Xmx256m"
ant compile jar

Running mutree

The pipeline requires the following input:

  • Absolute path to a coding sequence (CDS) alignment file, in FASTA format (-i option). Each sequence in the file should be composed of a concatenation of multiple gene CDS sequences, all of which must be in frame (i.e. the concatenated sequence must contain codon bases only, and its length must be a multiple of 3). If the length of a CDS is not a multiple of 3, any trailing bases after the last codon have to be removed before adding the CDS to the concatenated sequence. Each sequence in the FASTA file represents a sample (taxon), and must be labeled with a unique sample name. Sample names cannot include any blank spaces, tabulators, carriage returns, colons, commas, parentheses or square brackets. Each sequence must be on a single line, so that odd lines in the file contain the sample names, while even lines contain the sequences. The first sequence in the file will be used as an outgroup to root the tree, so this should be the reference sequence or a suitable outgroup sample. An example can be found in the file mutree-2.xx/examples/Alignment_H3HASO.fna (this has been adapted from one of treesub's example files).

  • Absolute path to a "gene table" (-g option). This is mandatory unless the -f option is used. The gene table must be a tab-delimited file with no header and two columns: gene symbol and CDS start position (position of the first nucleotide in the concatenated sequence). This allows mapping each mutation to the gene where it occurs and finding recurrent mutations. An example can be found in the file mutree-2.xx/examples/GeneTable_H3HASO.txt (the gene symbols and positions have been defined arbitrarily for this example).

  • Absolute path to an output directory (-o option). The directory will be created if necessary. The pipeline implements a checkpoint logging system, so in the event that the execution is interrupted before finishing, re-running mutree with the same output directory will resume the execution after the last successfully completed step.

mutree also accepts other optional input:

  • Number of RAxML threads (-t option). This allows using the multi-threaded version of RAxML to substantially speed up the tree inference and the ancestral sequence reconstruction. This value can be any positive integer, and cannot be higher than the available number of processors. The default value is 1.

  • Custom RAxML options for tree inference (-r option). This allows personalizing the RAxML routine, which uses rapid bootstrapping followed by maximum likelihood search by default (see pipeline description below). Custom options must be specified as a single string within quotes, and must include all the required options for running RAxML, except for the options -s, -n, -w and -T, which cannot be used.

  • Custom RAxML options for ancestral sequence reconstruction (-a option). This allows personalizing the ASR settings, which consist of a GTR substitution model plus a Gamma model of rate heterogeneity by default (see pipeline description below). Custom options must be specified as a single string within quotes, and must include all the required options for running RAxML, except for the options -f, -s, -n, -w and -T, which cannot be used.

  • Perform tree inference and rooting only (-f option). If this option is specified, only the first three steps of the pipeline will be run. Thus, in this case, it is not necessary to provide a gene table via -g, and there is also no need for the input alignment (-i) to be composed of coding sequences (unless the rest of the pipeline is to be run aft

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated9mo ago
Forks1

Languages

Java

Security Score

77/100

Audited on Jun 24, 2025

No findings