SkillAgentSearch skills...

ReconCNV

visualize CNV data from targeted capture based sequencing data

Install / Use

/learn @rghu/ReconCNV
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

License: GPL v3 Build Status codecov

reconCNV

Performing copy number analysis from targeted capture high-throughput sequencing has been a challenging task. This involves binning the targeted region, calculating the log ratio of the read depths between the sample and the reference, and then stitching together thousands of these data points into numerous segments (especially in the context of cancer) to derive the copy number state of genomic regions. Recently, several tools have been developed to adequately detect both somatic as well as germline CNVs. However, review and interpretation of these variants in a clinical as well as research setting is a daunting task. This can involve frequent switches back and forth from a static image to numerous tabular files resulting in an exasperated reviewer.
ReconCNV has been developed to overcome this challenge by providing interactive dashboard for hunting copy number variations (CNVs) from high-throughput sequencing data. The tool has been tested for targeted gene panels (including exome data). Python3's powerful visualization and data manipulation modules, namely Bokeh and Pandas, are utilized to create these dynamic visualizations. ReconCNV can be readily applied to most CNV calling algorithms with simple modifications to the configuration file.

Installation

The easiest way to get started with reconCNV is via conda. Using conda ensures you are running Python 3.6 (using which reconCNV was coded) and all dependencies are installed within an virtual environment. This avoids dependency conflicts with existing programs. If conda is not available on your system, a minimal installer can be installed using Miniconda.

  1. Clone the reconCNV repository

    git clone https://github.com/rghu/reconCNV.git
    

    To use the Docker container see instructions under Usage ...

  2. Create your virtual environment. In the below example we are creating a virtual environment called "reconCNV".

    conda env create -f environment.yml
    
  3. Activate your virtual environment.

    conda activate reconCNV
    
  4. You are now ready to use reconCNV!

  5. Once you are done using the virtual environment you can exit it.

    conda deactivate
    

Usage

usage: reconCNV.py [-h] --ratio-file RATIO_FILE --genome-file GENOME_FILE
                   --config-file CONFIG_FILE --out-dir OUT_DIR --out-file
                   OUT_FILE [--seg-file SEG_FILE] [--gene-file GENE_FILE]
                   [--seg-blacklist SEG_BLACKLIST]
                   [--annotation-file ANNOT_FILE] [--vcf-file VCF_FILE]
                   [--recenter RECENTER] [--vcf-filt-file]
                   [--vcf-blacklist BED_BLACKLIST] [--purity PURITY]
                   [--ploidy PLOIDY] [--gender GENDER] [--verbose] [--version]

Visualize CNV data from short read sequencing data.

optional arguments:
  -h, --help            show this help message and exit
  --ratio-file RATIO_FILE, -r RATIO_FILE
                        File which contains the log2(FC) of bins between the
                        tumor sample and another normal sample. [Required]
  --genome-file GENOME_FILE, -x GENOME_FILE
                        File which contains chromosome length and cumulative
                        genomic length. [Required]
  --config-file CONFIG_FILE, -c CONFIG_FILE
                        File which contains plot options and column name
                        customizations. [Required]
  --out-dir OUT_DIR, -d OUT_DIR
                        Directory to place output files. [Required]
  --out-file OUT_FILE, -o OUT_FILE
                        Output file name (file will be placed in output dir -
                        enter only filename). [Required]
  --seg-file SEG_FILE, -s SEG_FILE
                        File which contains the segmentation of log2(FC) bin
                        values between the tumor sample and normal sample.
  --gene-file GENE_FILE, -g GENE_FILE
                        File which contains gene calling information.
  --seg-blacklist SEG_BLACKLIST, -t SEG_BLACKLIST
                        BED file of problematic copy number regions to
                        highlight.
  --annotation-file ANNOT_FILE, -a ANNOT_FILE
                        File which contains gene/exon information.
  --vcf-file VCF_FILE, -v VCF_FILE
                        VCF containing variants to plot VAF.
  --recenter RECENTER, -y RECENTER
                        Recenter to provided log2(FC).
  --vcf-filt-file, -f   Flag to output filtered variants used for plotting
                        VAFs. (applicable only if providing VCF)
  --vcf-blacklist BED_BLACKLIST, -b BED_BLACKLIST
                        File containing variants to NOT plot VAF. (applicable
                        only if providing VCF)
  --purity PURITY, -p PURITY
                        Purity of the sample.
  --ploidy PLOIDY, -l PLOIDY
                        Ploidy of the sample.
  --gender GENDER, -z GENDER
                        Ploidy of the sample.
  --verbose, -j         Verbose logging output
  --version             show program's version number and exit

reconCNV has been tested on MacOS and Linux environments.

reconCNV is also available via Docker by executing the following commands:

  1. Docker pull command

    docker pull raghuc1990/reconcnv
    
  2. Append the following command before providing reconCNV command line options

    docker run --rm -it -v <directory_to_mount>:/opt/reconCNV/ raghuc1990/reconcnv:latest \ 
                        <reconCNV_options>
    

    Example:

    docker run --rm -it -v `pwd`:/opt/reconCNV/ raghuc1990/reconcnv:latest \ 
                        -r data/sample_data/HT-29.cnr \
                        -x data/hg19_genome_length.txt \
                        -d .  -o HT-29.html \
                        -c config.json
    

Quickstart with an example

At the minimum we need the ratio file and genome file to generate a plot using reconCNV. First make sure values of keys within the "column_names" field in the JSON configuration file matches those seen on the header of ratio and genome files.

 python3 reconCNV.py -r <ratio_file> \
                     -x <genome_file> \
                     -d <output_directory> \
                     -o <output_file> \
                     -c <config_file>

In this example we will use copy number analysis performed using CNVkit. Illumina sequencing for the HT-29 colon cancer cell line was performed using a 124 gene hybridization-based capture panel. data/sample_data directory contains the input files required for this example. See Input section below for details.

ratio file: data/sample_data/HT-29.cnr - contains coordinates and log2(FC) of bins.
segmentation file: data/sample_data/HT-29.cns - contains coordinates and log2(FC) of copy number segments.
gene file: data/sample_data/HT-29.genemetrics.cns - contains coordinates and gene-level CNV log2(FC).
VCF file: data/sample_data/HT-29.vcf - contains information on genotyped SNP loci.
annotation file: data/hg19_COSMIC_genes_model.txt
genome file: data/hg19_genome_length.txt - contains chromosome length and cumulative genomic length of chromosomes

Create output "results" directory

mkdir results

Run the command below to generate a HTML file with the visualization that can be opened on any modern web browser preferably Google Chrome.

python3 reconCNV.py -r data/sample_data/HT-29.cnr \
                    -x data/hg19_genome_length.txt \
                    -d results/ -o HT-29.html \
                    -c config.json

In this example we have generated two plots representing the CNV data for HT-29 cell line using genome coordinates as well as bin indices (sequential lineup of bins). Various tools (top left corner in the image below) can be used to interact with the data.

Screenshot of quickstart example

Tools
Pan: used for dragging the plot.
Box Select: used to perform a rectangular selection on the x-axis highlighting a genomic region. The selection simultaneously occurs on the "Bin Data" table as well.
Box Zoom: used to perform a rectangular zoom to a region of the plot.
Wheel Zoom: used to zoom in and out based on the current location of the mouse in the x-axis.
Tap Tool: click on any plot feature to open UCSC genome browser with those genome coordinates.
Reset Plot: return the plot to its original view.
Save View: export a PNG file of the current view.
Zoom In: zoom in by clicking the tool to the center of the plot.
Zoom Out: zoom out by clicking the tool to the center of the plot.
Crosshair: enable/disable display of crosshairs.
Hover: enable/disable display of annotation when hovering over plot features.

GIF tour of quickstart output

Now provide the segmentation file data/sample_data/HT-29.cns to annotate the copy number segments to the output HTML file.

python3 reconCNV.py -r data/sample_data/HT-29.cnr \
                    -x data/hg19_genome_length.txt \
                    -d results/ \
                    -o HT-29.html \
                    -c config.json \
                    -s data/sample_data/HT-29.cns

Next, when we provide the gene file `data/samp

View on GitHub
GitHub Stars35
CategoryDevelopment
Updated9d ago
Forks10

Languages

Python

Security Score

90/100

Audited on Mar 22, 2026

No findings