trackplot - Fast and easy visualisation of bigWig files in R

Introduction

trackplot.R is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files.

Installation

trackplot.R is a standalone R script and requires no installation. Just source it and you're good to go!

source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")

# OR

download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
source('trackplot.R') 

# OR If you prefer to have it as package

remotes::install_github(repo = "poisonalien/trackplot")

Features

Why trackplot?

It's extremely fast since most of the heavy lifting is done by bwtool. >15X faster than deeptools for equivalent profileplots and heatmaps
Lightweight and minimal dependency
- data.table and bwtool are the only requirements. Similar R packages GViz and karyoploteR has over 150 dependencies.
- Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible.
Supports GTF and standard UCSC gene formats as well.
Customization: Each plot can be customized for color, scale, height, width, etc.
Tracks can be summarized per condition (by mean, median, max, min)
PCA and, optional differential peak analysis with limma when using uniformly processed, normalized bigWig files.

Dependencies

data.table R package - which itself has no dependency.
bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; /usr/local/bin) or a directory under the PATH.

For macOS: Please download the pre-build binary from here. Make it executable with chmod +x bwtool. macOS gatekeeper might complain that it can't run the binary downloaded from the internet. If so, allow it in the security settings.
For centOS or debian: Follow these compilation instructions.

Citation

If you find the script useful consider citing trackplot and bwtool

Mayakonda A, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: 38476298

Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: 24489365

Usage

Simple usage - Make a table of all the bigWig files to be analysed with read_coldata() and pass it to the downstream functions.

flowchart TD
    a[bigWig file list] -->A{read_coldata}
    A --> B{track_extract}
    B --> B1[track_plot]
    A --> C{profile_extract}
    C --> C1[profile_summarize]
    C --> C3[profile_heatmap]
    C1 --> C2[profile_plot]
    A --> D{extract_summary}
    D --> D1[pca_plot]
    D --> D2[diffpeak]
    D2 --> D3[volcano_plot]

#Path to bigWig files
bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw", 
            "H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw")

#Make a table of bigWigs along with ref genome build
bigWigs = read_coldata(bws = bigWigs, build = "hg19")

trackplots

track_extract() and track_plot() are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, track_summarize can summarize tracks by condition.

Step-1: Extract signal from bigWig files

#Region to plot
oct4_loci = "chr6:31125776-31144789"

#Extract bigWig signal for a loci of interest
t = track_extract(colData = bigWigs, loci = oct4_loci)

#Or you can also specifiy a gene name instead of a loci 
# - loci and gene models will be automatically extracted from UCSC genome browser
t = track_extract(colData = bigWigs, gene = "POU5F1")

Step-2: Plot

Basic plot

track_plot(summary_list = t)

Add cytoband and change colors for each track

track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE)

Collapse all tracks into a single track

Use track_overlay = TRUE to overlay all tracks into a single line track

track_plot(summary_list = t, col = track_cols, show_ideogram = FALSE, track_overlay = TRUE)

Heighilight regions of interest (any bed files would do)

oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files 
track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks)

Add some chromHMM tracks to the bottom

chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an example file.

Note that the color code for each of the 15 states are as described here. In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument chromHMM_cols

chromHMM_peaks = "H1_chromHMM.bed"

track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks)

Add some chromHMM tracks from UCSC

UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data. In this case, use the argument ucscChromHMM with any values from TableName column of the below table.

                    TableName    cell                      Description              Tissue Karyotype
1: wgEncodeBroadHmmGm12878HMM GM12878     B-lymphocyte, lymphoblastoid               blood    normal
2:  wgEncodeBroadHmmH1hescHMM H1-hESC             embryonic stem cells embryonic stem cell    normal
3:   wgEncodeBroadHmmHepg2HMM   HepG2         hepatocellular carcinoma               liver    cancer
4:   wgEncodeBroadHmmHepg2HMM    HMEC         mammary epithelial cells              breast    normal
5:    wgEncodeBroadHmmHsmmHMM    HSMM        skeletal muscle myoblasts              muscle    normal
6:   wgEncodeBroadHmmHuvecHMM   HUVEC umbilical vein endothelial cells        blood vessel    normal

track_plot(summary_list = t, 
          col = track_cols, 
          show_ideogram = TRUE, 
          peaks = oct4_nanog_peaks, 
          ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))

Re-organize tracks

By default tracks are organized from top to bottom as c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument layout_ord. Furthermore, bigWig tracks themselves can be ordered with the argument bw_ord which accepts the names of the bigWig tracks as input and plots them in the given order.

#Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end.
track_plot(
  summary_list = t,
  col = track_cols,
  show_ideogram = TRUE,
  genename = c("POU5F1", "TCF19"),
  peaks = oct4_nanog_peaks,
  peaks_track_names = c("NANOG", "OCT4"),
  groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0, 
  bw_ord = c("NANOG", "OCT4"), 
  layout_ord = c("h", "g", "c")
)

narrowPeaks and broadPeaks

All of the above plots can also be generated with narrowPeak or broadPeak files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per UCSC convention

narrowPeak is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files.

narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed", 
                "H1_k4me

Trackplot

Install / Use

README