RCandy
RCandy: an R package for visualising homologous recombination events in bacterial genomes
Install / Use
/learn @ChrispinChaguza/RCandyREADME
RCandy
RCandy plots a phylogenetic tree in context of strain metadata and recombination events identified by Gubbins (Croucher et al. 2015, Nucleic Acids Research. PMID: 25414349) and BRATNextGen (Marttinen et al. 2012, Nucleic Acids Res. PMID: 22064866).
Installation
You can install RCandy from GitHub with devtools:
install.packages("devtools")
devtools::install_github("ChrispinChaguza/RCandy", build_vignettes = FALSE)
Note, R version >3.6 is required to install the package.
- Recommended version of R:
- R (>= 3.6)
- Required dependencies:
- ape,
- dplyr,
- graphics,
- grDevices,
- magrittr,
- phytools,
- shape,
- stats,
- data.table,
- stringr,
- tibble,
- tidyr,
- utils,
- viridis
- Other optional packages (may be required to build vignettes)
- knitr,
- rmarkdown,
- markdown
Run the code below to build the vignette.
library(RCandy)
library(ape)
library(tidyr)
Load sample data
In this example, we will load sample data for Streptococcus pneumoniae sequence type (ST) 320. This data was generated using genomes described in Gladstone RA et al. EBioMedicine. 2019 May;43:338-346. doi: 10.1016/j.ebiom.2019.04.021. Epub 2019 Apr 16. PMID: 31003929; PMCID: PMC6557916
Note that the metadata file is optional.
tree.file <- system.file("extdata", "ST320.final_tree.tre", package = "RCandy", mustWork = TRUE)
metadata.file <- system.file("extdata", "ST320.tsv", package = "RCandy", mustWork = TRUE)
gubbins.gff <- system.file("extdata", "ST320.recombination_predictions.gff", package = "RCandy",
mustWork = TRUE)
ref.genome.gff <- system.file("extdata", "Hungary19A-6.gff", package = "RCandy",
mustWork = TRUE)
Running RCandy
The simplest way to run RCandy to generate the phylogenetic tree and taxon metadata data. Here we have selected Country and Source columns in the metadata file. It's highly recommended that the first column in the metadata file should contain taxon names matching the names in the phylogenetic tree. We also specify additional options to ladderize and root the tree at midpoint.
By default the columns in the metadata file are assumed to be separated by tabs "\t", but this can be changed by passing this argument taxon.metadata.delimiter = "," when working with a file with comma separated values.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"))
<img src="inst/vignette-supp/unnamed-chunk-3-1.png" width="100%" />
If the first column in the metadata file does not contain taxon names then the column containing the taxon names should be explicitly specified using taxon.id.column option.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID")
<img src="inst/vignette-supp/unnamed-chunk-4-1.png" width="100%" />
Next we load the tree, metadata file, reference genome and recombination events generated by Gubbins. If the recombination events are generated by BRATNextGen then this option recom.input.type = "BRATNextGen" should be specified. By default, output from Gubbins is assumed (recom.input.type = "Gubbins").
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff)
<img src="inst/vignette-supp/unnamed-chunk-5-1.png" width="100%" />
We can specify recombination events in a specific region of the genome and show the gene labels in the reference genome annotation file. The gene labels are turned off by default.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE)
<img src="inst/vignette-supp/unnamed-chunk-6-1.png" width="100%" />
We could also colour the phylogenetic data by a column in the metadata column. Here we will colour the tips using Country.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country")
<img src="inst/vignette-supp/unnamed-chunk-7-1.png" width="100%" />
Another option, although very slow sometimes, is to map some characters onto the phylogenetic tree nodes using discrete ancestral character reconstruction using the ace function in ape. Below we use the presence/absence patterns of mefA gene as the discrete trait. Warning: Ancestral character reconstruction may take some a while.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country",
trait.for.ancestral.reconstr = "mefA")
#> Tips will be coloured by trait.for.ancestral.reconstr option and color.tree.tips.by.column will be ignored
<img src="inst/vignette-supp/unnamed-chunk-8-1.png" width="100%" />
Notice that although we specified the taxon to be coloured by Country, the discrete trait used for ancestral reconstruction overrides this option. When the trait for ancestral reconstruction contains only one value, no reconstruction is performed.
We can also customise the recombination diagram slightly by showing the border and genome tracks using these options show.rec.plot.border and show.rec.plot.tracks respectively.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country",
show.rec.plot.border = TRUE, show.rec.plot.tracks = TRUE)
<img src="inst/vignette-supp/unnamed-chunk-9-1.png" width="100%" />
We could also turn off the background for the recombination diagram using rec.plot.bg.transparency option.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country",
show.rec.plot.border = TRUE, show.rec.plot.tracks = TRUE, rec.plot.bg.transparency = 0.15)
<img src="inst/vignette-supp/unnamed-chunk-10-1.png" width="100%" />
What if we need to see the specific taxon names the phylogenetic tree in which recombination events occurred. We could specify the show.tip.label certain recombination events.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country",
rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, show.tip.label = TRUE)
<img src="inst/vignette-supp/unnamed-chunk-11-1.png" width="100%" />
We could also change the viridis colour pallette used to represent the metadata columns using color.pallette. There are five options namely "viridis","inferno","magma","cividis", and "plasma". Below we use "viridis" instead of the default (inferno).
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country",
rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, color.pallette = "viridis")
<img src="inst/vignette-supp/unnamed-chunk-12-1.png" width="100%" />
What if we want to change the angle of the gene labels? There is a way to do this as well using gene.label.angle option.
RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE,
taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"),
taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff,
genome.start = 30000, genome.end = 60000, show.gen
