SkillAgentSearch skills...

QC3

QC3, a quality control tool designed for DNA sequencing data for raw data, alignment, and variant calling.

Install / Use

/learn @slzhao/QC3
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

Table of Content

<a name="Introduction"/> # Introduction #

High throughput sequencing is the most effective way to screen for non-specific germline variants, somatic mutations, and structural variants. Some of the most popular sequencing paradigms in DNA sequencing are whole genome sequencing, exome sequencing, and target panel sequencing. While vastly informative, sequencing data poses significant bioinformatics challenges in areas such as data storage, computation time, and variant detection accuracy. One of the easily overlooked challenges associated with sequencing is quality control Quality control (QC) for DNA sequencing can be categorized based on three stages: raw data, alignment, and variant calling. QC on raw sequencing data has been given more attention than QC on alignment and the variant calling. There are many QC tools aimed at raw data such as FastQC, FastQ Screen FastX-Toolkit, NGS QC Toolkit, RRINSEQ and QC-Chain. However, few tools have been developed for conducting quality control on alignment and variant calling.

We present QC3, a quality control tool designed for DNA sequencing data for all the aforementioned three stages. QC3 provides both graphic and tabulated reports for quality control results. It also offers several unique features such as separation of bad and good reads (based on Illumina’s filter), detection of batch effect, and cross contamination. The input of QC3 takes three types of data: FASTQ, Binary Alignment Map (BAM), and Variant Calling Format, respectively, corresponding to the three stages of raw data, alignment, and variant detection. QC3 is written with Perl and R and is freely available for public use. It can be downloaded from QC3 website on github.

<a name="Download"/> # Download #

You can directly download QC3 from github by the following commands (If git has already been installed in your computer).

#The source codes of QC3 software will be downloaded to your current directory
git clone https://github.com/slzhao/QC3.git

Or you can also download the zip file of QC3 from github.

#The zip file of QC3 software will be downloaded to your current directory
wget https://github.com/slzhao/QC3/archive/master.zip
#A directory named QC3-master will be generated and the source codes will be extracted there
unzip master
<a name="Change"/> # Change log # <a name="R133"> ## Release version 1.33 on October 19, 2015 Release version 1.33 * Bam QC supports bam files with more than one alignments for one query; <a name="R132"> ## Release version 1.32 on June 26, 2015 Release version 1.32 * Fix bug in of Bam QC; <a name="R131"> ## Release version 1.31 on May 01, 2015 Release version 1.31 * Changes in methods of Vcf QC; * More robust for different formats of vcf files; <a name="R130"> ## Release version 1.30 on April 26, 2015 Release version 1.30 * Vcf QC supports more different formats of vcf files; * Improvement of Vcf QC report; <a name="R126"> ## Release version 1.26 on April 23, 2015 Release version 1.26 * More robust for different formats of vcf files; * new parameter -up for Vcf QC, Only use PASS variants in vcf; <a name="R125"> ## Release version 1.25 on April 16, 2015 Release version 1.25 * More robust for different formats of vcf files; <a name="R124"> ## Release version 1.24 on April 15, 2015 Release version 1.24 * More robust for different formats of vcf files; <a name="R123"> ## Release version 1.23 on October 20, 2014 Release version 1.23 * A table with counts for different SNPs was added in vcf QC; * The SNPs on X, Y chromosome and Mitochondrion can be selected to keep or not; * More robust for different formats of vcf files; <a name="R122"> ## Release version 1.22 on July 30, 2014 Release version 1.22 * Fix a bug when only one fastq or bam file is provided to fastq QC or Bam QC; <a name="R121"> ## Release version 1.21 on February 19, 2014 Release version 1.21 * Vcf QC now supports vcf files generated by latest version of GATK; <a name="R120"> ## Release version 1.20 on January 24, 2014 Release version 1.20 * Bam QC provided a statistic for alignment score generated by aligner(AS) if it was available in bam files; * Vcf QC provided statistics for reads on chromosome Y and heterozygous, Non-reference homozygous SNPs on chromosome X, which can be used for sex check; * The codes were improved to fix some issues for paper revision; * Documents were improved, an example from TCGA data was provided; <a name="R115"> ## Release version 1.15 on December 25, 2013 Release version 1.15 * Input file for fastq QC and bam QC now supports label for each file; * Vcf QC now supports vcf files generated by latest version of GATK; * Vcf QC now supports vcf files with only two samples; * Documents were improved; <a name="R110"> ## Release version 1.10 on September 24, 2013 Release version 1.10 * A configure file was provided so that the parameters of QC3 could be easily modified; * Some codes were updated to improve the performance; * Documents were improved; <a name="R100"> ## Release version 1.00 on August 09, 2013 Release version 1.00 * Documents were improved; * The codes in bam QC were improved; <a name="RC13"> ## Release candidate (RC) version 1.3 on August 06, 2013 Release candidate version 1.3 for test * The ANNOVAR annotation function in vcf QC was changed; * The codes in bam QC and fastq QC were improved; <a name="RC12"> ## Release candidate (RC) version 1.2 on July 31, 2013 Release candidate version 1.2 for test * The algorithm in bam QC was improved and the memory usage was highly decreased; <a name="RC11"> ## Release candidate (RC) version 1.1 on July 29, 2013 Release candidate version 1.1 for test * Documents were improved; * Some bugs were fixed; * Example files were provided; <a name="RC10"> ## Release candidate (RC) version 1.0 on July 26, 2013 Release candidate version 1.0 for test <a name="Prerequisites"/> # Prerequisites # <a name="irpp"/> ## Install Perl and required Perl packages ##

Perl is a highly capable, widely used, feature-rich programming language. It could be downloaded Perl website.

If Perl has already been installed on your computer, no other Perl module was needed to run QC3 in most cases. And you can run the following commands to make sure all the required modules have been installed.

#Go to the folder where your QC3 software is.
#And test whether all the required modules have been installed.
bash test.modules

The successful output would look like this

ok   File::Basename
ok   FindBin
ok   Getopt::Long
ok   HTML::Template
ok   source::bamSummary
ok   source::fastqSummary
ok   source::makeReport
ok   source::vcfSummary
ok   threads
ok   threads::shared

Otherwise, for example, if HTML::Template package was missing, it may look like this

ok   File::Basename
ok   FindBin
ok   Getopt::Long
fail HTML::Template
ok   source::bamSummary
ok   source::fastqSummary
ok   source::makeReport
ok   source::vcfSummary
ok   threads
ok   threads::shared

Then you need to install the missing packages from CPAN. A program was also provided to make the package installation more convenient.

#if HTML::Template was missing
bash install.modules HTML::Template
<a name="irs"/> ## Install required software ##

R

R is a free software environment for statistical computing and graphics. It could be downloaded from R website.

After you install R and add R bin file to your Path, the software can find and use R automatically. Or you can modify the config.txt file in the software directory and tell the program where the R is on your computer. Here is the line you need to modify.

#where the R bin file is
RBin="R"

samtools

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. It could be downloaded from SAM Tools website.

After you install SAMtools and add SAMtools bin file to your Path, the software can find and use SAMtools automatically. Or you can modify the config.txt file in the software

Related Skills

View on GitHub
GitHub Stars32
CategoryDesign
Updated1y ago
Forks15

Languages

Perl

Security Score

60/100

Audited on Feb 13, 2025

No findings