Trackplot
Generate IGV style locus tracks from bigWig files in R
Install / Use
/learn @PoisonAlien/TrackplotREADME
trackplot - Fast and easy visualisation of bigWig files in R
<!-- badges: start --> <!-- badges: end -->Introduction
trackplot.R is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files.
Installation
trackplot.R is a standalone R script and requires no installation. Just source it and you're good to go!
source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")
# OR
download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
source('trackplot.R')
# OR If you prefer to have it as package
remotes::install_github(repo = "poisonalien/trackplot")
Features
Why trackplot?
- It's extremely fast since most of the heavy lifting is done by bwtool. >15X faster than deeptools for equivalent
profileplotsandheatmaps - Lightweight and minimal dependency
- data.table and bwtool are the only requirements. Similar R packages GViz and karyoploteR has over 150 dependencies.
- Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
- Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible.
- Supports GTF and standard UCSC gene formats as well.
- Customization: Each plot can be customized for color, scale, height, width, etc.
- Tracks can be summarized per condition (by mean, median, max, min)
- PCA and, optional differential peak analysis with
limmawhen using uniformly processed, normalized bigWig files.
Dependencies
- data.table R package - which itself has no dependency.
- bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g;
/usr/local/bin) or a directory under the PATH.
-
For macOS: Please download the pre-build binary from here. Make it executable with
chmod +x bwtool. macOS gatekeeper might complain that it can't run the binary downloaded from the internet. If so, allow it in the security settings. -
For centOS or debian: Follow these compilation instructions.
Citation
If you find the script useful consider citing trackplot and bwtool
Mayakonda A, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: 38476298
Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: 24489365
Usage
Simple usage - Make a table of all the bigWig files to be analysed with read_coldata() and pass it to the downstream functions.
flowchart TD
a[bigWig file list] -->A{read_coldata}
A --> B{track_extract}
B --> B1[track_plot]
A --> C{profile_extract}
C --> C1[profile_summarize]
C --> C3[profile_heatmap]
C1 --> C2[profile_plot]
A --> D{extract_summary}
D --> D1[pca_plot]
D --> D2[diffpeak]
D2 --> D3[volcano_plot]
#Path to bigWig files
bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw",
"H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw")
#Make a table of bigWigs along with ref genome build
bigWigs = read_coldata(bws = bigWigs, build = "hg19")
trackplots
track_extract() and track_plot() are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, track_summarize can summarize tracks by condition.
Step-1: Extract signal from bigWig files
#Region to plot
oct4_loci = "chr6:31125776-31144789"
#Extract bigWig signal for a loci of interest
t = track_extract(colData = bigWigs, loci = oct4_loci)
#Or you can also specifiy a gene name instead of a loci
# - loci and gene models will be automatically extracted from UCSC genome browser
t = track_extract(colData = bigWigs, gene = "POU5F1")
Step-2: Plot
Basic plot
track_plot(summary_list = t)
Add cytoband and change colors for each track
track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE)
Collapse all tracks into a single track
Use track_overlay = TRUE to overlay all tracks into a single line track
track_plot(summary_list = t, col = track_cols, show_ideogram = FALSE, track_overlay = TRUE)
Heighilight regions of interest (any bed files would do)
oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks)
Add some chromHMM tracks to the bottom
chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an example file.
Note that the color code for each of the 15 states are as described here.
In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument chromHMM_cols
chromHMM_peaks = "H1_chromHMM.bed"
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks)
Add some chromHMM tracks from UCSC
UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data.
In this case, use the argument ucscChromHMM with any values from TableName column of the below table.
TableName cell Description Tissue Karyotype
1: wgEncodeBroadHmmGm12878HMM GM12878 B-lymphocyte, lymphoblastoid blood normal
2: wgEncodeBroadHmmH1hescHMM H1-hESC embryonic stem cells embryonic stem cell normal
3: wgEncodeBroadHmmHepg2HMM HepG2 hepatocellular carcinoma liver cancer
4: wgEncodeBroadHmmHepg2HMM HMEC mammary epithelial cells breast normal
5: wgEncodeBroadHmmHsmmHMM HSMM skeletal muscle myoblasts muscle normal
6: wgEncodeBroadHmmHuvecHMM HUVEC umbilical vein endothelial cells blood vessel normal
track_plot(summary_list = t,
col = track_cols,
show_ideogram = TRUE,
peaks = oct4_nanog_peaks,
ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))
Re-organize tracks
By default tracks are organized from top to bottom as c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument layout_ord. Furthermore, bigWig tracks themselves can be ordered with the argument bw_ord which accepts the names of the bigWig tracks as input and plots them in the given order.
#Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end.
track_plot(
summary_list = t,
col = track_cols,
show_ideogram = TRUE,
genename = c("POU5F1", "TCF19"),
peaks = oct4_nanog_peaks,
peaks_track_names = c("NANOG", "OCT4"),
groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0,
bw_ord = c("NANOG", "OCT4"),
layout_ord = c("h", "g", "c")
)
narrowPeaks and broadPeaks
All of the above plots can also be generated with narrowPeak or broadPeak files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per UCSC convention
narrowPeak is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files.
narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed",
"H1_k4me
