CALDER2
CALDER is a Hi-C analysis tool that allows: (1) compute chromatin domains from whole chromosome contacts; (2) derive their non-linear hierarchical organization and obtain sub-compartments; (3) compute nested sub-domains within each chromatin domain from short-range contacts.
Install / Use
/learn @CSOgroup/CALDER2README
CALDER user manual
CALDER is a Hi-C analysis tool that allows: (1) compute chromatin domains from whole chromosome contacts; (2) derive their non-linear hierarchical organization and obtain sub-compartments; (3) compute nested sub-domains within each chromatin domain from short-range contacts. CALDER is currently implemented in R.
-
Overview of the CALDER method:

-
Calder connects chromatin 3D organization to genomic function:

(A note on the performance of Calder vs PC-based approach)
- PC1 (sometimes PC2) of the correlation matrix was typically used to define A/B compartment. We found Calder demonstrates superior robustness over PC-based approach in identifying meaningful compartments, particularly when faced with complex chromosomal structural variations (figure on the left) or loose interaction between the p and q arms (figure on the right)

Multiple new features were added in version 2.0
- Support for hg19, hg38, mm9, mm10 and other genomes
- Support input in .hic format generated by Juicer tools (https://github.com/aidenlab/juicer)
- Optimized bin_size selection for more reliable compartment identification
- Aggregated all chromosome output into a single file for easier visualization in IGV
- Added output in tabular .txt format at bin level for easier downstream analysis
Below we introduce two main updates:
(1) Optimized bin_size selection
Due to reasons such as low data quality or large scale structural variation, compartments can be unreliably called at one bin_size (equivalent to resolution in the literature) but properly called at another bin_size. We added an optimized bin_size selection strategy to call reliable compartments. This strategy is based on the observation from our large scale compartment analysis (https://www.nature.com/articles/s41467-021-22666-3), that although compartments can change between different conditions, their overall correlation cor(compartment_rank_1, compartment_rank_2) is high (> 0.4).
<br>
<br>
The strategy: given a bin_size specified by user, we call compartments with extended bin_sizes and choose the smallest bin_size such that no bigger bin_size can increase the compartment correlation with a reference compartment more than 0.05. For example, if correlation for bin_size=10000 is 0.2 while for bin_size=50000 is 0.6, we are more confident that the latter is more reliable; if correlation for bin_size=10000 is 0.5 while for bin_size=50000 is 0.52, we would choose the former as it has higher resolution.
<br>
<br>
bin_size is extended in the following way thus contact matrices at any larger bin_sizes can be aggregated from the input contact matrices directly:
if(bin_size==5E3) bin_sizes = c(5E3, 10E3, 50E3, 100E3)
if(bin_size==10E3) bin_sizes = c(10E3, 50E3, 100E3)
if(bin_size==20E3) bin_sizes = c(20E3, 40E3, 100E3)
if(bin_size==25E3) bin_sizes = c(25E3, 50E3, 100E3)
if(bin_size==40E3) bin_sizes = c(40E3, 80E3)
if(bin_size==50E3) bin_sizes = c(50E3, 100E3)
Note that this strategy is currently only available for hg19, hg38, mm9 and mm10 genome for which we generated high quality reference compartments using Hi-C data from: GSE63525 for hg19, https://data.4dnucleome.org/files-processed/4DNFI1UEG1HD for hg38, GSM3959427 for mm9, http://hicfiles.s3.amazonaws.com/external/bonev/CN_mapq30.hic for mm10.
(2) Support for other genomes
Although CALDER was mainly tested on human and mouse dataset, it can be applied to dataset from other genomes. One additional information is required in such case: a feature_track presumably positively correlated with compartment score (thus higher values in A than in B compartment). This information will be used for correctly determining the A/B direction. Some suggested tracks are gene density, H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K36me3 (or negative transform of H3K9me3) signals. Note that this information will not alter the hierarchical compartment/TAD structure, and can come from any external study with matched genome. An example of feature_track is given in the Usage section.
Installation
Installing from conda
The easiest way to get the package is to install from Bioconda:
conda install --channel bioconda r-calder2
Installing from source
Make sure all dependencies have been installed:
- R.utils (>= 2.9.0),
- doParallel (>= 1.0.15),
- ape (>= 5.3),
- dendextend (>= 1.12.0),
- fitdistrplus (>= 1.0.14),
- igraph (>= 1.2.4.1),
- Matrix (>= 1.2.17),
- rARPACK (>= 0.11.0),
- factoextra (>= 1.0.5),
- data.table (>= 1.12.2),
- fields (>= 9.8.3),
- GenomicRanges (>= 1.36.0)
- ggplot2 (>= 3.3.5)
- strawr (>= 0.0.9)
Clone its repository and install it from source:
On the command line:
git clone https://github.com/CSOgroup/CALDER2.git
cd CALDER2
Then, once inside of the R interpreter:
install.packages(".", repos = NULL, type="source") # install from the cloned source file
Install CALDER and dependencies automaticly:
One can also install directly from Github, together with the dependencies as follows:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GenomicRanges")
install.packages("remotes")
remotes::install_github("CSOgroup/CALDER2.0")
Please contact yliueagle@googlemail.com for any questions about installation.
Use as a docker container
We provide a Docker image complete with all dependencies to run CALDER workflows.
# Pull the docker image from Dockerhub
docker pull lucananni93/calder2
# Run the image
docker run -it lucananni93/calder2
# Once inside the image we can run the command line Calder tool
calder [options]
# or we can just enter R
R
# and load Calder
library(CALDER)
Usage
CALDER contains three modules: (1) compute chromatin domains; (2) derive their hierarchical organization and obtain sub-compartments; (3) compute nested sub-domains within each compartment domain.
Input data format
CALDER works on contact matrices compatible with that generated by Juicer tools (https://github.com/aidenlab/juicer), either a .hic file, or three-column dump table retrieved by the juicer dump (or straw) command (https://github.com/aidenlab/juicer/wiki/Data-Extraction):
16050000 16050000 10106.306
16050000 16060000 2259.247
16060000 16060000 7748.551
16050000 16070000 1251.3663
16060000 16070000 4456.1245
16070000 16070000 4211.7393
16050000 16080000 522.0705
16060000 16080000 983.1761
16070000 16080000 1996.749
...
feature_track should be a data.frame or data.table of 4 columns (chr, start, end, score), and can be generated directly from conventional format such as bed or wig, see the example:
library(rtracklayer)
feature_track = import('ENCFF934YOE.bigWig') ## from ENCODE https://www.encodeproject.org/files/ENCFF934YOE/@@download/ENCFF934YOE.bigWig
feature_track = data.table::as.data.table(feature_track)[, c(1:3, 6)]
> feature_track
chr start end score
chr1 534179 534353 2.80512
chr1 534354 572399 0
chr1 572400 572574 2.80512
chr1 572575 628400 0
... ... ... ...
chrY 59031457 59032403 0
chrY 59032404 59032413 0.92023
chrY 59032414 59032415 0.96625
chrY 59032416 59032456 0.92023
chrY 59032457 59032578 0.78875
Example usage (1): use contact matrix file in dump format as input
chrs = c(21:22)
## demo contact matrices in dump format
contact_file_dump = as.list(system.file("extdata", sprintf("mat_chr%s_10kb_ob.txt.gz", chrs),
package='CALDER'))
names(contact_file_dump) = chrs
## Run CALDER to compute compartments but not nested sub-domains
CALDER(contact_file_dump=contact_file_dump,
chrs=chrs,
bin_size=10E3,
genome='hg19',
save_dir=save_dir,
save_intermediate_data=FALSE,
n_cores=2,
sub_domains=FALSE)
## Run CALDER to compute compartments and nested sub-domains / will take more time
CALDER(contact_file_dump=contact_file_dump,
chrs=chrs,
bin_size=10E3,
genome='hg19',
save_dir=save_dir,
save_intermediate_data=TRUE,
n_cores=2,
sub_domains=TRUE)
Example (2): use contact matrices stored in an R list
chrs = c(21:22)
contact_file_dump = as.list(system.file("extdata", sprintf("mat_chr%s_10kb_ob.txt.gz", chrs),
package='CALDER'))
names(contact_file_dump) = chrs
contact_tab_dump = lapply(contact_file_dump, data.table::fread)
CALDER(contact_tab_dump=contact_tab_dump,
chrs=chrs,
bin_size=10E3,
genome='hg19',
save_dir=save_dir,
save_intermediate_data=FALSE,
n_cores=2,
sub_domains=FALSE)
Example (3): use .hic file as input
chrs = c(21:22)
hic_file = 'HMEC_combined_30.hic' ## can be downloaded from https://ftp.ncbi.nlm.nih.gov/geo/series/GSE63nnn/GSE63525/suppl/GSE63525_HMEC_combined_30.hic
CALDER(contact_file_hic=hic_file,
chrs=chrs,
bin_size=10E3,
genome='hg19',
save_dir=save_dir,
save_intermediate_data=FALSE,
n_cores=2,
sub_domains=FALSE)
Example (4): run CALDER on other genomes
## prepare feature_track
library(rtracklayer)
feature_track_raw = import('ENCFF934YOE.bigWig') ## from ENCODE https://www.en
