CNView

Visualization, quantitation, and annotation of CNVs from population-scale whole-genome sequencing data.

Contact: Ryan Collins (rcollins@chgr.mgh.harvard.edu)

CNView: Example CNV Visualization

General Information

CNView Summary

CNView is a low-profile visualization tool for read depth in batches of next-generation sequencing libraries and, more specifically, for visually inspecting sites of copy-number variation (CNV). It also implements a framework for estimating CNV probabilities and annotating CNV intervals.

We are open to tailoring applciations of CNView for specific needs or to customize plots. CNView is also under active development. Contact us at rcollins@chgr.mgh.harvard.edu if you have any questions or requests. Please post any issues to GitHub.

Accessing Reference Libraries

Like many depth-based CNV tools, CNView does not work on individual libraries. Instead, CNView jointly models multiple libraries simultaneously, normalizing both within and across libraries to reduce systematic coverage biases. Thus, while technically CNView will run successfully from just two libraries, generally CNVs become clearly resolvable with at least 20 total samples jointly modelled.

If you do not have access to an appropriate cohort of normative reference samples, thousands of standard Illumina WGS libraries are available from the 1,000 Genomes Project and many hundreds of other libraries are available through other public repositories like the European Nucleotide Archive (ENA) or the Sequence Read Archive (SRA).

Citing CNView

If you use CNView, please cite our preprint ( Collins et al., 2016 ).

Code Documentation

CNView.R

Performs joint normalization of binned coverage values across a batch of WGS libraries and facilitates visualization. Also interfaces with UCSC Genome Browser to underlay several annotation tracks.

Usage: CNView.R [options] chr start end samples.list covmatrix.bed outfile


Options:
	-c INTEGER, --compression=INTEGER
		compression scalar for rebinning, if desired [default 'NULL']

	-i CHARACTER, --highlight=CHARACTER
		tab-delimited list of coordinate pairs for intervals to highlight and color as third column; NULL disables highlighting [default NA]

	-w INTEGER, --window=INTEGER
		distance to append to both sides of input interval for viewing [default 61.8% of plot interval]

	--ymin=INTEGER
		minimum value for y axis [default NULL]

	--ymax=INTEGER
		maximum value for y axis [default NULL]

	-n INTEGER, --normDist=INTEGER
		distance outside region to use for normalization (both sides) [default 5000000]

	-s INTEGER, --subsample=INTEGER
		truncate coverage matrix to [s] samples; useful for very large cohorts [default 200]

	--gcex=INTEGER
		scalar applied to all fonts and legend [default 1]

	--names=CHARACTER
		list of custom names to be applied to each plot panel (e.g. 'mother', 'father', 'child', rather than actual sample IDs) [default NA]

	-t CHARACTER, --title=CHARACTER
		custom title for plot [default NULL]

	-p, --probs
		add CNV probabilities below each higlighted interval [default FALSE]

	-u, --noUCSC
		disable UCSC track plotting [default FALSE]

	-G, --nogenesymbols
		disable gene symbol printing below gene bodies in UCSC tracks [default FALSE]

	--tabix
		use tabix to index into coverage matrix [default FALSE]

	--noUnix
		disable use of unix coreutils [default FALSE]

	-q, --quiet
		disable verbose output [default FALSE]

	-l, --nolegend
		disable legend on plot [default TRUE]

	-h, --help
		Show this help message and exit

Usage Notes:

Several base-level options (such as returnData or plot) are not wrapped by the Rscript implementation. To access those features, import the R function directly and run from the R command prompt.
By default, CNView will randomly subsample an input matrix to include only 200 libraries (including those specified to plot). This is a useful parameter to reduce computational requirements (both memory and runtime) for large cohorts, but can be modulated with the -s/--subsample option.
Use the --tabix option to improve speed when reading from very large coverage matrices. However, your input matrix must be bgzipped and tabix-indexed; see the Samtools documentation for details.

Example Usage

Getting Started

Required Input Data
The input data for CNView is a tab-delimited bed-style matrix of binned coverage values, which can be generated by bedtools coverage followed by bedtools unionbedg (bedtools documentation). Alternatively, you can use binCov.py from the CNView sister repository, WGD. Generally, 100bp-1kb sequential bins provide a reasonable tradeoff between resolution and modeling speed on a standard 8GB two-core laptop.

An example coverage matrix at 100bp binsize for human libraries aligned to reference genome hg19 would look something like this:

Chr  Start     End       SampleA  SampleB  SampleC  ...  SampleZ
1    0         100       89       56       217      ...  141
1    100       200       98       60       230      ...  132
1    200       300       102      59       202      ...  142
...  ...       ...       ...      ...      ...      ...  ...
Y    59373200  59373300  79       48       207      ...  133
Y    59373300  59373400  89       51       196      ...  138
Y    59373400  59373500  93       68       198      ...  129

CNView has been tested on libraries ranging from 1X to >300X coverage simultaneously and appears to perform relatively consistently irrespective of the ranges of coverage between individual libraries in the same batch.

Example A

Canonical Deletion Plotted in a Single Sample
The basic use-case for CNView is to visualize a predefined CNV locus, which can be predicted from whole-genome sequencing data with many different algorithms, such as cn.MOPS, CNVnator, or GenomeSTRiP. Once a putative CNV locus is defined, visualization can be performed right out of the box with CNView by invoking CNView.R with all default parameters.
Canonical Deletion Plotted in a Single Sample
Example code to generate the above plot:

bash$ ./CNView.R 2 178714141 178760307 SFARI_d12529p1 \
                 ~/cov_matrix.bed \
                 ./ExamplePlots/CNView.ExamplePlotA.pdf \
                 --title "Example Plot A: Canonical Deletion, Single Sample \
                 --probs"

This example is visualizing a 46kb deletion of two exons from PDE11A. The first three positional arguments were the coordinates of the deletion (the highlighted region), while the other positional arguments were as follows:

SFARI_d12529p1 is the ID of the sample being plotted, which has to exactly match one of the names of the columns in the coverage matrix.
~/cov_matrix.bed is the path to the input coverage matrix, like the example provided above.
./ExamplePlots/CNView.ExamplePlotA.pdf is the path to the desired output file (always will be pdf).
--title overrides the default title with the subsequently supplied string in quotes.
--probs prints the FDR-corrected probability of the highlighted window being deleted or duplicated. The q-value is calculated by evaluating the t-score at each bin overlapping the highlighted window, combining those p-values with Fisher's Method, then correcting for false discovery rate (FDR) with the Benjamini-Hochberg procedure.

Running the above code will also print some runtime diagnostics to stdout, which can alternatively be silenced with -q/--quiet:

+-------------------+
| CNView Visualizer |
|     (c) 2016      |
+-------------------+
Sample ID file 'SFARI_d12529p1' not found, assuming single sample ID provided
Attempting to connect to UC

CNView

Install / Use

README

CNView

Table of Contents

General Information

Code documentation

Example Usage

General Information

CNView Summary

Accessing Reference Libraries

Citing CNView

Code Documentation

CNView.R

Example Usage

Getting Started

Example A

Related Skills