cfDNApipe

Introduction
Section 1: Installation Tutorial
Section 2: cfDNApipe Highlights
Section 3: A Quick Tutorial for Analysis WGBS data
- Section 3.1: Set Global Reference Configures
- Section 3.2: Execute build-in WGBS Analysis Pipeline
Section 4: Perform Case-Control Analysis for WGBS data
Section 5: How to Build Customized Pipeline using cfDNApipe
Section 6: A Basic Quality Control: Fragment Length Distribution
Section 7: Nucleosome Positioning
Section 8: Inferring Tissue-Of-Origin based on deconvolution
Section 9: Additional Function: WGS SNV/InDel Analysis
Section 10: Additional Function: Virus Detection
Section 11: Other Functions
Section 12: How to use cfDNApipe results in Bioconductor/R
FAQ

Links:

Introduction

cfDNApipe(cell free DNA Pipeline) is an integrated pipeline for analyzing cell-free DNA WGBS/WGS data. It contains many cfDNA quality control and statistical algorithms. Also we collected some useful cell free DNA references and provided them here.Users can access the cfDNApipe documentation Here.

The whole pipeline is established based on the processing graph principle. Users can use the preset pipeline for WGBS/WGS data as well as build their own analysis pipeline from any intermediate data like bam files. The main functions are as the following picture.

<center> <img style="border-radius: 0.3125em; box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);" src="./pics/pipeline.png"> <div style="color:orange; border-bottom: 1px solid #d9d9d9; display: inline-block; color: #999; padding: 2px;">cfDNApipe Functions</div> </center>

Section 1: Installation Tutorial

Section 1.1: System requirement

The popular WGBS/WGS analysis toolkits are released on Unix/Linux system, based on different program languages, like FASTQC and Bowtie2. Therefore, it's very difficult to rewrite all the software in one language. Fortunately, conda/bioconda program collected many prevalent python modules and bioinformatics software, so we can install all the dependencies through conda/bioconda and arrange pipelines using python.

We recommend using conda/Anaconda and create a virtual environment to manage all the dependencies. If you did not install conda before, please follow this tutorial to install conda first.

After installation, you can create a new virtual environment for cfDNA analysis. Virtual environment management means that you can install all the dependencies in this virtual environment and delete them easily by removing this virtual environment.

Section 1.2: Create environment and Install Dependencies

We tested our pipeline using different versions of software and provide an environment yml file for users. Users can download this file and create the environment in one command line.

First, please download the yml file.

wget https://xwanglabthu.github.io/cfDNApipe/environment.yml

Then, run the following command. The environment will be created and all the dependencies as well as the latest cfDNApipe will be installed.

# clean unused packages before installation
conda clean -y --all

# install environment
conda env create -n cfDNApipe -f environment.yml

Note: The environment name can be changed by replacing "-n cfDNApipe" to "-n environment_name".

Note: If errors about unavailable or invalid channel occur, please check that whether the .condarc file in your ~ directory had been modified. Modifing .condarc file may cause wrong channel error. In this case, just rename/backup your .condarc file. Once the installation finished, this file can be recoveried. Of course, you can delete .condarc file if necessary.

Section 1.3: Activate Environment and Use cfDNApipe

Once the environment is created, the users can enter the environment using the following command.

conda activate cfDNApipe

Now, just open python and process ** cell-free DNA WGBS/WGS paired/single end** data. For more detailed explanation for each function and parameters, please see cfDNApipe documentation.

Section 2: cfDNApipe Highlights

cfDNApipe is a highly integrated cfDNA WGS/WGBS data processing pipeline. We designed many useful build-in mechanisms. Here, we will introduce some important features.

Section 2.1: Dataflow Graph for WGS and WGBS Data Processing

cfDNApipe is organized by a built-in dataflow with a strictly defined up- and down-stream data interface. The following figure shows how WGS and WGBS data are processed.

<center> <img style="border-radius: 0.3125em; box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);" src="./pics/cfDNApipe_flowchart.png"> <div style="color:orange; border-bottom: 1px solid #d9d9d9; display: inline-block; color: #999; padding: 2px;">cfDNApipe Flowchart Overview</div> </center>

For detailed data flow diagrams, please see this cfDNApipe documentaion. In this documentation, we give thorough up- and down-stream relationships for every step.

Section 2.2: Reference Auto Download and Building

For any HTS data analysis, the initial step is to set reference files such as genome sequence and annotation files. cfDNApipe can download references and build reference indexes automatically. If the reference and index files already exist, cfDNApipe will use these files instead of download or rebuilding.

What reference files does cfDNApipe need?

For analyzing WGS data (taken hg19 as example) genome sequence file and indexes: hg19.fa, hg19.chrom.sizes, hg19.dict, hg19.fa.fai bowtie2 related files: hg19.1.bt2 ~ hg19.4.bt2, hg19.rev.1.bt2~ hg19.rev.2.bt2 Other reference files: like blacklist file and cytoBand file, we provide them here.
For analyzing WGBS data (taken hg19 as an example) genome sequence file and indexes: hg19.fa, hg19.chrom.sizes, hg19.dict, hg19.fa.fai bismark related files: Bisulfite_Genome folder with CT_conversion and GA_conversion Other reference files: like CpG island file and cytoBand file, we provide them here.

Here, we introduced the global reference configure function in cfDNApipe to download and build reference files automatically.

cfDNApipe contains 2 types of global reference configure function, pipeConfigure and pipeConfigure2. Function pipeConfigure is for single group data analysis (without control group). Function pipeConfigure2 is for case and control analysis. Either function will check the reference files, such as bowtie2 and bismark references. If not detected, references will be downloaded and built. This step is necessary and puts things right once and for all.

*<font c

CfDNApipe

Install / Use

README