cLoops2: Full Stack Analysis Tool for Chromatin Interactions

</div> <p align="center"> <img align="center" src="https://github.com/YaqiangCao/cLoops2/blob/master/pngs/FlowChart.png"> </p>

Introduction

Welcome to cLoops2! This is a comprehensive analysis tool for 3D genomic interaction data, building upon our previous work cLoops.

cLoops2 has evolved from simple loop-calling based on assumption-free clustering to a full suite of analysis tools for 3D genomic interaction data. It has been specifically optimized for data types such as Hi-TrAC/Trac-looping/ChIA-PET/HiChIP, where interactions are enriched over the genome through experimental steps. cLoops2 also supports Hi-C-like data. The improvements from cLoops to cLoops2 are designed to address challenges in achieving higher resolutions with next-generation genome architecture mapping technologies.

cLoops2 is designed with reference to bedtools and Samtools for command-line style programming. If you have experience with these tools, you will find cLoops2 easy and efficient to use, and you can seamlessly combine commands and integrate them as steps in your processing pipeline.

Please refer to our Hi-TrAC method manuscript bioRxiv official version, Hi-TrAC domain-centric analysis manuscript bioRxiv official version, and cLoops2 manuscript bioRxiv official version for what cLoops2 can do and show.

Citation
Install
Basic Usage and Quick Guide
cLoops2 Main Functions
Input, Intermediate, Output Files

Citation

If you use cLoops2 in your research (the idea, the algorithm, the analysis scripts or the supplemental data), please give us a star on the GitHub repo page and cite our paper as follows:

Official version on NAR: Yaqiang Cao et al. "cLoops2: a full-stack comprehensive analytical tool for chromatin interactions"
or
Preprint bioRxiv: Yaqiang Cao et al. "cLoops2: a full-stack comprehensive analytical tool for chromatin interactions"

Install

1. Easy way through pip for stable version

Python3 is required.

pip install cLoops2

2. Install from source with test data for latest version

cLoops2 is written purely in Python3 (cLoops was written in Python2). If you are familiar with conda, cLoops2 can be installed easily with the following Linux shell commands (also tested well in win10 ubuntu subsystem, MacOS).

# for most updated code, or download the release version 
git clone --depth=1 https://github.com/YaqiangCao/cLoops2
cd cLoops2
conda create --name cLoops2 --file cLoops2_env.yaml
conda activate cLoops2 
python3 setup.py install

Necessary Python3 third-party packages are listed below, all of which can be installed through conda. If you like to install cLoops2 through the old school way python setup.py install, please install the 3rd dependencies first.

tqdm
numpy 
scipy 
pandas
scikit-learn
seaborn
pyBigWig
matplotlib
joblib
networkx

After installation, whenever you want to run cLoops2, just activate the environment with conda: conda activate cLoops2. Happy peak/loop-calling and have fun exploring all the other kinds of analyses.

Basic Usage and Quick Guide

Example data background introduction

Example data for testing is available at cLoops2/example/data. The BEDPE files were from Hi-TrAC experiments mapped to hg38 for chromosome 21 in GM12878 and K562 cell lines, two biological replicates for each cell line. Only intra-chromosomal PETs were kept. Raw FASTQ reads were processed by tracPre2.py.

For other kinds of 3D genomic interaction data such as ChIA-PET, Hi-C, and HiChIP, cLoops2 can also start with provided BEDPE files.

The following example command lines were also recorded in cLoops2/example/test_run/run.sh, which can be used to test the main programs of cLoops2 after installation.

Routine analysis step 1: get basic statistics of PETs from input BEDPE file

cLoops2 qc -f ../data/GM_Trac1_hg38_chr21_partaa.bedpe.gz,../data/GM_Trac1_hg38_chr21_partab.bedpe.gz -o test -p 2

Please note, in cLoops2, multiple files/directories can be assigned as input with the separation of the comma, please do not leave blanks between names. The majority of cLoops2 analysis modules can be run in a parallel way with the option of -p. Most of them will generate a cLoops2.log file recording the program parameters and important messages for later review.

The informative output is a .txt file with annotation of information as follows.

| Sample | TotalPETs | UniquePETs | Redundancy | IntraChromosomalPETs(cis) | cisRatio | InterChromosomalPETs(trans) | transRatio | meanDistance | closePETs(distance<=1kb) | closeRatio | middlePETs(1kb<distance<=10kb) | middleRatio | distalPETs(distance>10kb) | distalRatio | |------------------|-----------|------------|-------------|---------------------------|-------------|-----------------------------|-------------|--------------|--------------------------|-------------|--------------------------------|-------------|---------------------------|-------------| | GM_HiTrac_bio1 | 906506 | 901589 | 0.005424123 | 655640 | 0.727204968 | 245949 | 0.272795032 | 522978.0929 | 138201 | 0.210787932 | 274800 | 0.419132451 | 242639 | 0.370079617 | | GM_HiTrac_bio2 | 665759 | 662197 | 0.005350284 | 506058 | 0.76421065 | 156139 | 0.23578935 | 501104.2879 | 104360 | 0.206221421 | 216640 | 0.428093223 | 185058 | 0.365685356 | | K562_HiTrac_bio1 | 596886 | 591215 | 0.009500977 | 474746 | 0.8030006 | 116469 | 0.1969994 | 314420.4126 | 115360 | 0.242993095 | 226568 | 0.477240461 | 132818 | 0.279766444 | | K562_HiTrac_bio2 | 413818 | 410415 | 0.008223422 | 326743 | 0.796128309 | 83672 | 0.203871691 | 327855.2136 | 68571 | 0.209862185 | 162132 | 0.496206499 | 96040 | 0.293931316 |

Routine analysis step 2: pre-process BEDPE file(s) into cLoops2 data

#get directory seperately for GM12878, only target chromosome chr21
cLoops2 pre -f ../data/GM_HiTrac_bio1.bedpe.gz -o gm_bio1 -c chr21
cLoops2 pre -f ../data/GM_HiTrac_bio2.bedpe.gz -o gm_bio2 -c chr21 
#get the combined data for GM12878
cLoops2 pre -f ../data/GM_HiTrac_bio1.bedpe.gz,../data/GM_HiTrac_bio2.bedpe.gz -o gm -c chr21
#get the directory seperately for K562 first
cLoops2 pre -f ../data/K562_HiTrac_bio1.bedpe.gz -o k562_bio1 -c chr21
cLoops2 pre -f ../data/K562_HiTrac_bio2.bedpe.gz -o k562_bio2 -c chr21
#then combine the data, only keep 1 PET for the same position, default the same to cLoops2 pre
cLoops2 combine -ds k562_bio1,k562_bio2 -o k562 -keep 1

The output directory contains one .json file for the basic PET statistics and .ixy files, which are used to call peaks, loops, or any analysis implemented in cLoops2.

For data backup/sharing purposes, the directory can be saved as a .tar.gz file through tar command.

If you move the directory or change the files in the directory, please run cLoops2 update to update the information of petMeta.json, as all ixy files were recorded as absolute paths.

Routine analysis step 3: estimate reasonable contact matrix resolution

cLoops2 estRes -d gm -o gm -bs 5000,1000,200 -p 10
cLoops2 estRes -d k562 -o k562 -bs 5000,1000,200 -p 10

This step is not needed for peak-calling 1D data such as ChIP-seq or ChIC-seq.

We prefer to use the highest resolution with >=50% PETs (solid lines

CLoops2

Install / Use

README