CLoops
Accurate and flexible loops calling tool for 3D genomic data.
Install / Use
/learn @YaqiangCao/CLoopsREADME
<img alt="ViewCount" src="https://views.whatilearened.today/views/github/YaqiangCao/cLoops.svg">
cLoops: loop-calling for ChIA-PET, Hi-C, HiChIP and Trac-looping

Introduction
Chromosome conformation capture (3C) derived high-throughput sequencing methods such as ChIA-PET,HiChIP and Hi-C provide genome-wide view of chromatin organization. Fine scale loops formed by interactions of regulatory elements spanning hundreds kilobases can be detected from these data. Here we introduce cLoops ('see loops'),a common loops calling tool for ChIA-PET, HiChIP and high-resolution Hi-C data. Paired-end tags (PETs) are first classified as self-ligation and inter-ligation clusters using an optimized unsupervisied clustering algorithm. The significances of the inter-ligation clusters are then estimated using permutated local background.
If you find cLoops useful, please give us a star at github and cite our paper :
Official version: Yaqiang Cao, Zhaoxiong Chen, Xingwei Chen, Daosheng Ai, Guoyu Chen, Joseph McDermott, Yi Huang, Guo Xiaoxiao, Jing-Dong J Han, Accurate loop calling for 3D genomic data with cLoops, Bioinformatics, , btz651, https://doi.org/10.1093/bioinformatics/btz651
Preprint bioRxiv: Yaqiang Cao, Xingwei Chen, Daosheng Ai, Zhaoxiong Chen, Guoyu Chen, Joseph McDermott, Yi Huang, Jing-Dong J. Han (2018) "Accurate loop calling for 3D genomic data with cLoops" bioRxiv 465849; doi: https://doi.org/10.1101/465849
You can also find the cLoops wiki in Chinese here
Please kindly refer to cLoops2 for more analytical modules.
Install
If you are familar with conda, cLoops could be installed very easily with following after clone and cd in it.
git clone https://github.com/YaqiangCao/cLoops
cd cLoops
conda env create --name cLoops --file cLoops_env.yaml
conda activate cLoops
python setup.py install
Then every time just use conda activate cLoops to run cLoops enviroment.
Or you prefer the old school, install from scratch. scipy,numpy, seaborn, pandas and joblib are required. Joblib version 0.11 is requried to avoid parallel computating bugs caused by it for newer version. Install it through pip2.7 install --user joblib==0.11. If you have problems for installing scipy, please refer to Anaconda or SAGE.
wget https://github.com/YaqiangCao/cLoops/archive/0.93.tar.gz
tar xvzf 0.93.tar.gz
cd cLoops-0.93
python setup.py install
To test whether cLoops is successfully installed:
cd examples
sh run.sh
Please refer to here to install cLoops to customized path.
Usage
Run cLoops -h to see all options. Key parameters are eps and minPts . minPts defines at least how many PETs are required for a candidate loop, eps defines the distance requried for two PETs being neighbors. For practically usage to tune parameters, using the PETs in the smallest chromosome except chrY and chrM, then run a series of eps and minPts,all rounds clustering result will be combined to determine your parameters.
Since version 0.8, cLoops added a parameter --mode(-m), which is the pre-set parameters for different types of data. -m 0 accepts user settings; -m 1 equals -eps 500,1000,2000 -minPts 5 for sharp peak like ChIA-PET data; -m 2 equals -eps 1000,2000,5000 -minPts 5 for broad peak like ChIA-PET data; -m 3 equals -eps 5000,7500,10000 -minPts 20,30,40,50 -hic for deep sequenced Hi-C data (~200 million cis PETs); -m 4 equals -eps 2500,5000,7500,10000 -minPts 20,30 -hic for ~100 million cis PETs HiChIP data;for ~30-40 miilion cis PETs HiChIP data, we suggested -eps 2500,5000,7500,10000 -minPts 10,15,20 -hic. You can always add more eps and smaller minPts to get more candidate loops and maybe more significant loops, however, it takes longer time.
Input
Mapped PETs in BEDPE format, compressed files with gzip are also accepected, following columns are necessary: chrom1 (1st),start1 (2),end1 (3),chrom2 (4),start2 (5),end2 (6),strand1 (9),strand2 (10). For the column of name or score, "." is accepcted. Columns are seperated by "\t". For example as following :
chr1 9945 10095 chr1 248946216 248946366 . . + +
chr1 10034 10184 chr1 180987 181137 . . + -
chr1 10286 10436 chr1 181103 181253 . . + -
Output
The main output is a loop file and a PDF file or PDFs for the plot of self-ligation and inter-ligation PETs distance distributions. For the .loop file, columns and explaination are as follwing:
column | name | explaination
------ | ---- | ------------
0th | loopId | Id for a loop, like chr1-chr1-1
1th | ES | Enrichment score for the loop, caculated by observed PETs number divided by the mean PETs number of nearby permutated regions
2th | FDR | false discovery rate for the loop, caculated as the number of permutated regions that there are more observed PETs than the region
3th | binomal_p-value | binomal test p-value for the loop
4th | distance | distance (bp) between the centers of the anchors for the loop
5th | hypergeometric_p-value | hypergeometric test p-value for the loop
6th | iva | genomic coordinates for the left anchor, for example, chr13:50943050-50973634
7th | ivb | genomic coordinates for the right anchor
8th | poisson_p-value | poisson test p-value for the loop
9th | ra | observed PETs number for the left anchor
10th | rab | observed PETs number linking the left and right anchors
11th | rb | observed PETs number for the right anchor
12th | poisson_p-value_corrected | Bonferroni corrected poisson p-value according to number of loops for each chromosome
13th | binomal_p-value_corrected | Bonferroni corrected binomal p-value according to number of loops for each chromosome
14th | hypergeometric_p-value_corrected | Bonferroni corrected hypergeometric p-value according to number of loops for each chromosome
15th | significant | 1 or 0, 1 means we think the loop is significant compared to permutated regions. You can ignore this and customize your cutoffs using above values by visualization a small chromosome in the Juicebox or washU.
Examples
All following examples source data, result and log file can be found in the examples.
1. ChIA-PET data
We provide a test data from GM12878 CTCF ChIA-PET (GSM1872886), just the chromosome 21 mapped to hg38. Run the command as following then you will get the result if cLoops is successfuly installed. The eps is auto estimated and default minPts is 5,-w option will generate loops for visualization in washU browser,-j option will generate loops for visualization in Juicebox .
wget https://github.com/YaqiangCao/cLoops/blob/master/examples/GSM1872886_GM12878_CTCF_ChIA-PET_chr21_hg38.bedpe.gz
cLoops -f GSM1872886_GM12878_CTCF_ChIA-PET_chr21_hg38.bedpe.gz -o chiapet -w -j -s -m 1 -plot
For ChIA-PET data with sharp peak, like the CTCF here, you will get the inter-ligation and self-ligation PETs distance distribution like following, the two kinds of PETs well seperated using auto estimated eps:

If your experimental data doesn't look like this by auto estimated eps, which could be true for some ChIA-PET data with broad peak (like H3K27ac), please use the small chromosome (chr21 in human and chr19 in mouse) run a series of eps, then chose the smallest one that generate the well seperated distance distribution to run cLoops, or just using the series.
We recommend washU to visualize the loops, by the script jd2washU we can convert the cLoops temp files to washU long range track, and bedtools,bgzip & tabix are needed in the command enviroment.
jd2washU -d chiapet -o chiapet
With other ChIP-seq data, you can get following plot:

2. HiChIP data
We provide test data of GM12878 cohesin HiChIP two biological replicates, just the chromosome 21 mapped to hg38. Run the command as following to call merged loops. -s option is used to keep working directory and temp files, which could be used by scripts of deLoops, jd2washU (BEDTOOLS needed), jd2juice (Juicer needed), jd2fingerprint and jd2saturation. -hic option means using cutoffs design for Hi-C like data, see above.
wget https://github.com/YaqiangCao/cLoops_su
