DcHiC

dcHiC: Differential compartment analysis for Hi-C datasets

Generate Convert Improve

Install / Use

/learn @ay-lab/DcHiC

About this skill

Quality Score

0/100

README

dcHiC: Differential Compartment Analysis of Hi-C Datasets

dcHiC is a tool for differential compartment analysis of Hi-C datasets. It features many capabilities, including:

Optimized PCA calculations (faster + capable of analysis up to 5kb resolution)
Comprehensive identification of significant compartment changes between any number of cell lines (with replicates), including with pseudo-bulk single cell data
Beautiful standalone HTML files for visualization of results
Identification of differential loops anchored in significant differential compartments (using Fit-Hi-C)
And much more!

Paper

If you want to see examples of dcHiC in action or cite our tool, please see our paper in Nature Communications! Web-hosted visualization examples of case scenarios in the paper here.

To see how to run dcHiC, read our docs and try our demo (below)! Information about data pre-processing and running single-cell data is available in the wiki.

Demo

This README contains the key information you will need to use this application. However, some users may find a demo helpful; ours includes a script to run package installation as well as detailed guides for different options of dcHiC. All of these resources are available in the demo directory, with relevant instructions inside!

Installation

The latest version of dcHiC runs pre-dominantly from R (3+) and Python (3+). The necessary packages may be installed via conda or manually (those transitioning environments should have most, if not all, of the packages already installed). For the core application, the following packages are necessary:

Option 1: Conda

We recommend using Conda to install all dependencies in a virtual environment. The suggested path is using the appropriate <a href="https://docs.conda.io/en/latest/miniconda.html/">Miniconda</a> distribution.

If you face any issues, be sure your "conda" command specifically calls the executable under the miniconda distribution (e.g., ~/miniconda3/condabin/conda). If "conda activate" command gives an error when you run it the first time then you will have to run "conda init bash" once.

To install, go to the directory of your choice and run:

git clone https://github.com/ay-lab/dcHiC
conda env create -f ./packages/dchic.yml
conda activate dchic

Afterward, activate the environment and install some purpose-built processing functions with R CMD INSTALL functionsdchic_1.0.tar.gz (functions file under 'packages'). M1 Mac users may face some issues, as some bioconductor packages have not yet been updated for native ARM64 support; we recommend using an x86-64 based OS for the cleanest experience.

Option 2: Manual Installation

To install the dependencies manually, ensure that you have the following packages installed:

Packages in R

Rcpp
optparse
bench
bigstatsr
bigreadr
robust
data.table
networkD3
depmixS4
rjson
limma (bioconductor)
IHW (bioconductor)
lpsymphony (bioconductor, incase if you face error while installing the IHW package)
ggplot2
R.utils
hashmap (.tar.gz file under 'packages')

Packages in Python

igv-reports

Bedtools

dcHiC requires bedtools. Please install the program as directed—it should be accessible via $PATH.

Those who wish to perform differential loop analysis should also download the latest Python version of FitHiC, which requires a set of Python libraries: numpy, scipy, sk-learn, sortedcontainers, and matplotlib. You may also need to install 'cooler' if you wish to use .cool files. See documentation on how to do so.

Afterward, activate the environment and install some purpose-built processing functions with R CMD INSTALL functionsdchic_1.0.tar.gz (functions file under 'packages').

To check which R packages are already installed

Rscript -e 'plist <- c("functionsdchic","hashmap","R.utils","Rcpp","RcppEigen","BH","optparse","bench","bigstatsr","bigreadr","robust","data.table","networkD3","depmixS4","rjson","limma","ggplot2","lpsymphony","IHW"); setdiff(plist,basename(find.package(plist)))'

If you get character(0) then you're all set, otherwise install the packages shown in the output.

Input File

Create an input file for dcHiC with the format below. The matrix and bed columns are for input data (see next section), whereas the replicate_prefix and experiment_prefix columns describe the hierarchy of data.

Note: Do not use dashes ("-") or dots (".") in the replicate or experiment prefix names.

<mat>         <bed>         <replicate_prefix>      <experiment_prefix>

For instance, consider this sample file which describes two replicates for two Hi-C profiles:

matr1_e1.txt  matr1_e1.bed   exp1_R1_100kb                  exp1
matr2_e1.txt  matr2_e2.bed   exp1_R2_100kb                  exp1
matr1_e2.txt  matr1_e2.bed   exp2_R1_100kb                  exp2
matr2_e2.txt  matr2_e2.bed   exp2_R2_100kb                  exp2

Input Data

dcHiC accepts sparse matrices as its input (Hi-C Pro style). If you have .cool or .hic files, see how to convert their format here.

To see the full list of options, run Rscript dchicf.r --help or view dchicdoc.txt here.

The matrix file should look like this:

<indexA> <indexB> <count>

1         1       300
1         2       30
1         3       10
2         2       200
2         3       20
3         3       200
 			....

... And the corresponding bed file like this:

<chr>	<start>	<end>	<index>

chr1	0	      40000	   1
chr1	40000	  80000	   2
chr1	80000	  120000	 3
 			....

Blacklisted Regions

Many high-throughput genomics studies "blacklist" problematic mapping regions (see the study <a href = "https://www.nature.com/articles/s41598-019-45839-z">here</a>). If you wish to blacklist regions from your data, you may do so by adding a fifth column to your input file containing 1's in rows that should be blacklisted:

<chr>	<start>	<end>	<index>	<blacklisted>

chr1	0	      40000	 1	     0
chr1	40000	  80000	 2	     1
 			....

Run Options

To see the full list of run options with examples of run code for each one, run Rscript dchicf.r --help. The most high-level option is --pcatype, which allows users to perform different types of step-wise analysis. Each of these run options will require other input information.

| --pcatype option | Meaning
| --------------------- | ----------------------- | | cis | Find compartments on a cis interaction matrix | trans | Find compartments on a trans interaction matrix | select | Selection of best PC for downstream analysis [Must be after cis or trans step] | analyze | Perform differential analysis on selected PC's [Must be after select step] | subcomp | Optional: Assigning sub-compartments based on PC magnitude values using HMM segmentation | fithic | Run Fit-Hi-C to identify loops before running dloop (Optional) | dloop | Find differential loops anchored in at least one of the differential compartments across the samples (Optional) | viz | Generate IGV vizualization HTML file. Must have performed other steps in order (optional ones not strictly necessary) before this one. | enrich | Perform gene enrichment analysis (GSEA) of genes in differential compartments/loops

Here is a sample full run using the traditional cis matrix for compartment analysis:

Must - 
Rscript dchicf.r --file input.ES_NPC.txt --pcatype cis --dirovwt T --cthread 2 --pthread 4
Rscript dchicf.r --file input.ES_NPC.txt --pcatype select --dirovwt T --genome mm10
Rscript dchicf.r --file input.ES_NPC.txt --pcatype analyze --dirovwt T --diffdir ES_vs_NPC_100Kb
Rscript dchicf.r --file input.ES_NPC.txt --pcatype viz --diffdir ES_vs_NPC_100Kb --genome mm10

Optional - 
Rscript dchicf.r --file input.ES_NPC.txt --pcatype subcomp --dirovwt T --diffdir ES_vs_NPC_100Kb
Rscript dchicf.r --file input.ES_NPC.txt --pcatype fithic --dirovwt T --diffdir ES_vs_NPC_100Kb --fithicpath "/path/to/fithic.py" --pythonpath "/path/to/python"
Rscript dchicf.r --file input.ES_NPC.txt --pcatype dloop --dirovwt T --diffdir ES_vs_NPC_100Kb
Rscript dchicf.r --file input.ES_NPC.txt --pcatype viz --diffdir ES_vs_NPC_100Kb --genome mm10 
Rscript dchicf.r --file input.txt --pcatype enrich --genome mm10 --diffdir conditionA_vs_conditionB --exclA F --region both --pcgroup pcQnm --interaction intra --pcscore F --compare F

Output

As output, dcHiC creates two types of directories. The first are raw PCA results, in directories named after the third column of the input file. One of these is created for each input Hi-C profile; inside, there will be directories "intra_pca" or "inter_pca" depending on whether the user specified compartment calculations based on intra- or inter-chromosomal interactions and raw PC values for each chromosome inside each one.

The second overarching directory is called DifferentialResult, which contains directories for differential results (on any number of parameter settings). These directory names are specified under the -analyze pcatype option (which performs differential calling) dcHiC where users denote a --diffdir where they want the analysis to be done. Mul

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。