SkillAgentSearch skills...

CLoops

Accurate and flexible loops calling tool for 3D genomic data.

Install / Use

/learn @YaqiangCao/CLoops

README

<div align="center">

Language Stars LOC <img alt="ViewCount" src="https://views.whatilearened.today/views/github/YaqiangCao/cLoops.svg"> GitHub Clones

</div>

cLoops: loop-calling for ChIA-PET, Hi-C, HiChIP and Trac-looping

Introduction

Chromosome conformation capture (3C) derived high-throughput sequencing methods such as ChIA-PET,HiChIP and Hi-C provide genome-wide view of chromatin organization. Fine scale loops formed by interactions of regulatory elements spanning hundreds kilobases can be detected from these data. Here we introduce cLoops ('see loops'),a common loops calling tool for ChIA-PET, HiChIP and high-resolution Hi-C data. Paired-end tags (PETs) are first classified as self-ligation and inter-ligation clusters using an optimized unsupervisied clustering algorithm. The significances of the inter-ligation clusters are then estimated using permutated local background.

If you find cLoops useful, please give us a star at github and cite our paper :

Official version: Yaqiang Cao, Zhaoxiong Chen, Xingwei Chen, Daosheng Ai, Guoyu Chen, Joseph McDermott, Yi Huang, Guo Xiaoxiao, Jing-Dong J Han, Accurate loop calling for 3D genomic data with cLoops, Bioinformatics, , btz651, https://doi.org/10.1093/bioinformatics/btz651

Preprint bioRxiv: Yaqiang Cao, Xingwei Chen, Daosheng Ai, Zhaoxiong Chen, Guoyu Chen, Joseph McDermott, Yi Huang, Jing-Dong J. Han (2018) "Accurate loop calling for 3D genomic data with cLoops" bioRxiv 465849; doi: https://doi.org/10.1101/465849

You can also find the cLoops wiki in Chinese here

Please kindly refer to cLoops2 for more analytical modules.


Install

If you are familar with conda, cLoops could be installed very easily with following after clone and cd in it.

git clone https://github.com/YaqiangCao/cLoops
cd cLoops
conda env create --name cLoops --file cLoops_env.yaml
conda activate cLoops 
python setup.py install

Then every time just use conda activate cLoops to run cLoops enviroment.

Or you prefer the old school, install from scratch. scipy,numpy, seaborn, pandas and joblib are required. Joblib version 0.11 is requried to avoid parallel computating bugs caused by it for newer version. Install it through pip2.7 install --user joblib==0.11. If you have problems for installing scipy, please refer to Anaconda or SAGE.

wget https://github.com/YaqiangCao/cLoops/archive/0.93.tar.gz
tar xvzf 0.93.tar.gz
cd cLoops-0.93
python setup.py install    

To test whether cLoops is successfully installed:

cd examples
sh run.sh

Please refer to here to install cLoops to customized path.


Usage

Run cLoops -h to see all options. Key parameters are eps and minPts . minPts defines at least how many PETs are required for a candidate loop, eps defines the distance requried for two PETs being neighbors. For practically usage to tune parameters, using the PETs in the smallest chromosome except chrY and chrM, then run a series of eps and minPts,all rounds clustering result will be combined to determine your parameters.

Since version 0.8, cLoops added a parameter --mode(-m), which is the pre-set parameters for different types of data. -m 0 accepts user settings; -m 1 equals -eps 500,1000,2000 -minPts 5 for sharp peak like ChIA-PET data; -m 2 equals -eps 1000,2000,5000 -minPts 5 for broad peak like ChIA-PET data; -m 3 equals -eps 5000,7500,10000 -minPts 20,30,40,50 -hic for deep sequenced Hi-C data (~200 million cis PETs); -m 4 equals -eps 2500,5000,7500,10000 -minPts 20,30 -hic for ~100 million cis PETs HiChIP data;for ~30-40 miilion cis PETs HiChIP data, we suggested -eps 2500,5000,7500,10000 -minPts 10,15,20 -hic. You can always add more eps and smaller minPts to get more candidate loops and maybe more significant loops, however, it takes longer time.


Input

Mapped PETs in BEDPE format, compressed files with gzip are also accepected, following columns are necessary: chrom1 (1st),start1 (2),end1 (3),chrom2 (4),start2 (5),end2 (6),strand1 (9),strand2 (10). For the column of name or score, "." is accepcted. Columns are seperated by "\t". For example as following :

chr1	9945	10095	chr1	248946216	248946366	.	.	+	+
chr1	10034	10184	chr1	180987	181137	.	.	+	-
chr1	10286	10436	chr1	181103	181253	.	.	+	-

Output

The main output is a loop file and a PDF file or PDFs for the plot of self-ligation and inter-ligation PETs distance distributions. For the .loop file, columns and explaination are as follwing:

column | name | explaination ------ | ---- | ------------ 0th | loopId | Id for a loop, like chr1-chr1-1 1th | ES | Enrichment score for the loop, caculated by observed PETs number divided by the mean PETs number of nearby permutated regions 2th | FDR | false discovery rate for the loop, caculated as the number of permutated regions that there are more observed PETs than the region
3th | binomal_p-value | binomal test p-value for the loop 4th | distance | distance (bp) between the centers of the anchors for the loop 5th | hypergeometric_p-value | hypergeometric test p-value for the loop 6th | iva | genomic coordinates for the left anchor, for example, chr13:50943050-50973634 7th | ivb | genomic coordinates for the right anchor 8th | poisson_p-value | poisson test p-value for the loop 9th | ra | observed PETs number for the left anchor 10th | rab | observed PETs number linking the left and right anchors 11th | rb | observed PETs number for the right anchor 12th | poisson_p-value_corrected | Bonferroni corrected poisson p-value according to number of loops for each chromosome 13th | binomal_p-value_corrected | Bonferroni corrected binomal p-value according to number of loops for each chromosome 14th | hypergeometric_p-value_corrected | Bonferroni corrected hypergeometric p-value according to number of loops for each chromosome 15th | significant | 1 or 0, 1 means we think the loop is significant compared to permutated regions. You can ignore this and customize your cutoffs using above values by visualization a small chromosome in the Juicebox or washU.


Examples

All following examples source data, result and log file can be found in the examples.

1. ChIA-PET data

We provide a test data from GM12878 CTCF ChIA-PET (GSM1872886), just the chromosome 21 mapped to hg38. Run the command as following then you will get the result if cLoops is successfuly installed. The eps is auto estimated and default minPts is 5,-w option will generate loops for visualization in washU browser,-j option will generate loops for visualization in Juicebox .

wget https://github.com/YaqiangCao/cLoops/blob/master/examples/GSM1872886_GM12878_CTCF_ChIA-PET_chr21_hg38.bedpe.gz
cLoops -f GSM1872886_GM12878_CTCF_ChIA-PET_chr21_hg38.bedpe.gz -o chiapet -w -j -s -m 1 -plot

For ChIA-PET data with sharp peak, like the CTCF here, you will get the inter-ligation and self-ligation PETs distance distribution like following, the two kinds of PETs well seperated using auto estimated eps:

If your experimental data doesn't look like this by auto estimated eps, which could be true for some ChIA-PET data with broad peak (like H3K27ac), please use the small chromosome (chr21 in human and chr19 in mouse) run a series of eps, then chose the smallest one that generate the well seperated distance distribution to run cLoops, or just using the series.

We recommend washU to visualize the loops, by the script jd2washU we can convert the cLoops temp files to washU long range track, and bedtools,bgzip & tabix are needed in the command enviroment.

jd2washU -d chiapet -o chiapet       

With other ChIP-seq data, you can get following plot:

2. HiChIP data

We provide test data of GM12878 cohesin HiChIP two biological replicates, just the chromosome 21 mapped to hg38. Run the command as following to call merged loops. -s option is used to keep working directory and temp files, which could be used by scripts of deLoops, jd2washU (BEDTOOLS needed), jd2juice (Juicer needed), jd2fingerprint and jd2saturation. -hic option means using cutoffs design for Hi-C like data, see above.

wget https://github.com/YaqiangCao/cLoops_su
View on GitHub
GitHub Stars114
CategoryDevelopment
Updated4mo ago
Forks18

Languages

Python

Security Score

97/100

Audited on Dec 3, 2025

No findings