Ggcoverage
Visualize and annotate genomic coverage with ggplot2
Install / Use
/learn @showteeth/GgcoverageREADME
ggcoverage - Visualize and annotate omics coverage with ggplot2
<img src = "man/figures/ggcoverage.png" align = "right" width = "200"/>Introduction
The goal of ggcoverage is to visualize coverage tracks from genomics,
transcriptomics or proteomics data. It contains functions to load data
from BAM, BigWig, BedGraph, txt, or xlsx files, create genome/protein
coverage plots, and add various annotations including base and amino
acid composition, GC content, copy number variation (CNV), genes,
transcripts, ideograms, peak highlights, HiC contact maps, contact links
and protein features. It is based on and integrates well with ggplot2.
It contains three main parts:
- Load the data:
ggcoveragecan load BAM, BigWig (.bw), BedGraph, txt/xlsx files from various omics data, including WGS, RNA-seq, ChIP-seq, ATAC-seq, proteomics, et al. - Create omics coverage plot
- Add annotations:
ggcoveragesupports six different annotations:- base and amino acid annotation: Visualize genome coverage at single-nucleotide level with bases and amino acids.
- GC annotation: Visualize genome coverage with GC content
- CNV annotation: Visualize genome coverage with copy number variation (CNV)
- gene annotation: Visualize genome coverage across genes
- transcription annotation: Visualize genome coverage across different transcripts
- ideogram annotation: Visualize the region showing on whole chromosome
- peak annotation: Visualize genome coverage and peak identified
- contact map annotation: Visualize genome coverage with Hi-C contact map
- link annotation: Visualize genome coverage with contacts
- peotein feature annotation: Visualize protein coverage with features
Installation
ggcoverage is an R package distributed as part of the CRAN
repository. To install the package, start
R and enter one of the following commands:
# install via CRAN (not yet available)
install.packages("ggcoverage")
# OR install via Github
install.package("remotes")
remotes::install_github("showteeth/ggcoverage")
In general, it is recommended to install from the Github repository (updated more regularly).
Once ggcoverage is installed, it can be loaded like every other
package:
library("ggcoverage")
Manual
ggcoverage provides two
vignettes:
- detailed manual: step-by-step usage
- customize the plot: customize the plot and add additional layers
RNA-seq data
Load the data
The RNA-seq data used here is from Transcription profiling by high
throughput sequencing of HNRNPC knockdown and control HeLa
cells.
We select four samples to use as example: ERR127307_chr14,
ERR127306_chr14, ERR127303_chr14, ERR127302_chr14, and all bam
files were converted to bigwig files with
deeptools.
Load metadata:
# load metadata
meta_file <-
system.file("extdata", "RNA-seq", "meta_info.csv", package = "ggcoverage")
sample_meta <- read.csv(meta_file)
sample_meta
#> SampleName Type Group
#> 1 ERR127302_chr14 KO_rep1 KO
#> 2 ERR127303_chr14 KO_rep2 KO
#> 3 ERR127306_chr14 WT_rep1 WT
#> 4 ERR127307_chr14 WT_rep2 WT
Load track files:
# track folder
track_folder <- system.file("extdata", "RNA-seq", package = "ggcoverage")
# load bigwig file
track_df <- LoadTrackFile(
track.folder = track_folder,
format = "bw",
region = "chr14:21,677,306-21,737,601",
extend = 2000,
meta.info = sample_meta
)
# check data
head(track_df)
#> seqnames start end width strand score Type Group
#> 1 chr14 21675306 21675950 645 * 0 KO_rep1 KO
#> 2 chr14 21675951 21676000 50 * 1 KO_rep1 KO
#> 3 chr14 21676001 21676100 100 * 2 KO_rep1 KO
#> 4 chr14 21676101 21676150 50 * 1 KO_rep1 KO
#> 5 chr14 21676151 21677100 950 * 0 KO_rep1 KO
#> 6 chr14 21677101 21677200 100 * 2 KO_rep1 KO
Prepare mark region:
# create mark region
mark_region <- data.frame(
start = c(21678900, 21732001, 21737590),
end = c(21679900, 21732400, 21737650),
label = c("M1", "M2", "M3")
)
# check data
mark_region
#> start end label
#> 1 21678900 21679900 M1
#> 2 21732001 21732400 M2
#> 3 21737590 21737650 M3
Load GTF
To add gene annotation, the gtf file should contain gene_type and gene_name attributes in column 9; to add transcript annotation, the gtf file should contain a transcript_name attribute in column 9.
gtf_file <-
system.file("extdata", "used_hg19.gtf", package = "ggcoverage")
gtf_gr <- rtracklayer::import.gff(con = gtf_file, format = "gtf")
Basic coverage
The basic coverage plot has two types:
- facet: Create subplot for every track (specified by
facet.key). This is default. - joint: Visualize all tracks in a single plot.
joint view
Create line plot for every sample (facet.key = "Type") and color
by every sample (group.key = "Type"):
basic_coverage <- ggcoverage(
data = track_df,
plot.type = "joint",
facet.key = "Type",
group.key = "Type",
mark.region = mark_region,
range.position = "out"
)
basic_coverage
<img src="man/figures/README-basic_coverage_joint-1.png" width="100%" style="display: block; margin: auto;" />
Create group average line plot (sample is indicated by
facet.key = "Type", group is indicated by group.key = "Group"):
basic_coverage <- ggcoverage(
data = track_df,
plot.type = "joint",
facet.key = "Type",
group.key = "Group",
joint.avg = TRUE,
mark.region = mark_region,
range.position = "out"
)
basic_coverage
<img src="man/figures/README-basic_coverage_joint_avg-1.png" width="100%" style="display: block; margin: auto;" />
Facet view
basic_coverage <- ggcoverage(
data = track_df,
plot.type = "facet",
mark.region = mark_region,
range.position = "out"
)
basic_coverage
<img src="man/figures/README-basic_coverage-1.png" width="100%" style="display: block; margin: auto;" />
Custom Y-axis style
Change the Y-axis scale label in/out of plot region with
range.position:
basic_coverage <- ggcoverage(
data = track_df,
plot.type = "facet",
mark.region = mark_region,
range.position = "in"
)
basic_coverage
<img src="man/figures/README-basic_coverage_2-1.png" width="100%" style="display: block; margin: auto;" />
Shared/Free Y-axis scale with facet.y.scale:
basic_coverage <- ggcoverage(
data = track_df,
plot.type = "facet",
mark.region = mark_region,
range.position = "in",
facet.y.scale = "fixed"
)
basic_coverage
<img src="man/figures/README-basic_coverage_3-1.png" width="100%" style="display: block; margin: auto;" />
Add gene annotation
- default behavior is to draw genes (transcripts), exons and UTRs with different line width
- can bec adjusted using
gene.size,exon.sizeandutr.sizeparameters - frequency of intermittent arrows (light color) can be adjusted using
the
arrow.numandarrow.gapparameters - genomic features are colored by
strandby default, which can be changed using thecolor.byparameter
basic_coverage +
geom_gene(gtf.gr = gtf_gr)
<img src="man/figures/README-gene_coverage-1.png" width="100%" style="display: block; margin: auto;" />
Add transcript annotation
In “loose” style (default style; each transcript occupies one line):
basic_coverage +
geom_transcript(gtf.gr = gtf_gr, label.vjust = 1.5)
<img src="man/figures/README-transcript_coverage-1.png" width="100%" style="display: block; margin: auto;" />
In “tight” style (attempted to place non-overlapping transcripts in one line):
basic_coverage +
geom_transcript(
gtf.gr = gtf_gr,
overlap.style = "tight",
label.vjust = 1.5
)
<img src="man/figures/README-transcript_coverage_tight-1.png" width="100%" style="display: block; margin: auto;" />
Add ideogram
The ideogram is an overview plot about the respective position on a
chromosome. The plotting of the ideogram is implemented by the ggbio
package. This package needs to be installed separately (it is only
‘Suggested’ by ggcoverage).
library(ggbio)
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: ggplot2
#> Registere
