EpiVerse

Welcome to the official repository of EpiVerse, an innovative three-stage pipeline chromatin strucuture imputation tool. EpiVerse specializes in high-quality, cross cell-type Hi-C imputations, facilitating unique "epigenome-level" perturbational Hi-C experiments. It offers valuable insights into chromatin architecture dynamics across various epigenetic conditions.

EpiVerse

Visualization Browser

Explore imputed Hi-C data effortlessly with our dedicated browser, designed for 39 tissues. This intuitive tool simplifies data interpretation, enhancing your research experience.

Installation

EpiVerse comprises three core modules: Avocado, HiConformer, and MIRNet. Due to conda environment conflicts, we recommend installing EpiVerse across three separate conda environments.

Avocado Environment

Set up the Avocado environment using:

conda create -n Avocado
conda activate Avocado
pip install avocado-epigenome

HiConformer training and MIRNet training Environment

For HiConformer and MIRNet, execute these commands:

conda env create -f environments/HiConformer.yml

Data processing environment

We utilize fanc to convert the file format and process.

conda env create -f environments/fanc.yml

Avocado

We leverage the Avocado model for new cell type imputations, following its prescribed methodology.

Download the pretrained Avocado model weights: Begin by downloading the Avocado pretrained model weights. Use the ENCODE-Core hg38 Version available at Zenodo
Fit New Celltypes: Utilize Avocado's fitting function with your data (comprising available tracks) to fit new cell types. This is done using the model.fit_celltypes(data, n_epochs=50) command, where data represents your available epigenomic tracks.
Predict Missing Tracks: Run the prediction function for your specific cell type and the missing tracks using model.predict("{YOUR_CELLTYPE}", "{MISSING_TRACKS}").
Integrating Imputations with Available Tracks: Once you have predicted the missing tracks using Avocado, the next step is to integrate these imputed tracks with your existing data. This combined dataset will serve as the input for HiConformer.

Note1: If the tissues is inside the Avocado imputations, we recommend to Download the Avocado imputations from ENCODE. You can see the HiConformer Data Preparation section.

Note2: For further details, please refer to the Avocado Github repository.

HiConformer

HiConformer, the core module of EpiVerse, maps imputed epigenetic signals from Avocado and one-hot encoded DNA sequences to Hi-C diagonals.

Data Preparation

HiConformer requires DNA sequences, epigenetic signals, Hi-C, and ChromHMM data for training. For inference purposes only, you need DNA sequences, epigenetic signals, and pre-trained model weights. Below is a guide for preparing data for the IMR-90 cell line as an example.

Note1: To only perform inference with HiConformer, download the DNA sequence, epigenetic signals, and pre-trained model weights.

Note2: Please cd in to the pipelines folder to run each script with conda activate HiConformer

HiConformer Pretrained Model Weights: Download the pretrained model weights and configuration from Zenodo.
DNA sequence: Retrieve the hg38 reference DNA sequence from UCSC using:
```
python HiConformer_ref_crawler.py
```
Epigenetic Signals: Refer to the Avocado section for generating custom imputed epigenetic signals. Example for pre-imputed IMR-90 MboI:
```
python Avocado_preimpute_crawler.py --avoname IMR-90 --savename IMR90_MboI
```

Hi-C data: Data is sourced from from 3DIV. To download Hi-C data for IMR90-MboI, run:

python HiConformer_3DIV_crawler.py --threeDIVname IMR90_MboI --savename IMR90_MboI

We also use the peaks called from HICCUPS to sample better diagonals for training time. It's not required if you only need to inference HiConformer. It should have the following format.

chr1	65585000	65590000	chr1	65700000	65705000	.	9.06	.	.	5.25	4.65e-05	0.0277	6.84	6.67e-06	0.00276
chr1	90240000	90245000	chr1	90665000	90670000	.	9.52	.	.	4.99	4.65e-05	0.0277	7.12	6.67e-06	0.00276
chr1	72650000	72655000	chr1	72700000	72705000	.	11.5	.	.	5.18	1.36e-05	0.0156	4.32	0.00012	0.0302
chr1	178855000	178860000	chr1	178930000	178935000	.	21.1	.	.	3.72	9.77e-07	0.00372	3.61	9.77e-07	0.00246
chr1	203055000	203060000	chr1	203370000	203375000	.	8.49	.	.	5.38	4.27e-05	0.0247	11.9	1.13e-06	0.0025
chr1	189495000	189500000	chr1	191160000	191165000	.	9.42	.	.	5.2	4.65e-05	0.0404	5.84	4.65e-05	0.0134
chr1	189500000	189505000	chr1	190455000	190460000	.	9.24	.	.	4.81	4.65e-05	0.0337	4.79	4.65e-05	0.0157
chr1	88725000	88730000	chr1	89660000	89665000	.	12.3	.	.	7.84	1.5e-08	0.000117	5.68	2.59e-06	0.00201
...

ChromHMM data: ChromHMM data is obtained and processed from Roadmap epigenomics. To download ChromHMM data for IMR90-MboI:
```
python HiConformer_ChromHMM_crawler.py --roadmapEID E017 --savename IMR90_MboI
```

Custom Data preparation

Users can generate custom data inputs for HiConformer. This section outlines the necessary steps.

Note1: If you only intend to use HiConformer for inference, you need only the DNA sequence, epigenetic signals, and pretrained model weights.

Note2: Please cd in to the pipelines folder to run each script with conda activate HiConformer

DNA Sequence: This step is identical to the one in the Data Preparation section. Obtain the hg38 reference from UCSC using:
```
python HiConformer_ref_crawler.py
```
Epigenetic signals: Refer to the Avocado section for generating custom imputed epigenetic signals. To use other Avocado-available tissues, check the metadata inside Avocado folder for your target tissue. Use the following command:
```
python Avocado_preimpute_crawler.py --avoname {YOUR_TARGET_TISSUE} --savename {YOUR_DEFINED_TARGET_TISSUE_NAME}
```
Hi-C data: Retrieve and generate data from 3DIV. After identifying your target tissue from the 3DIV available tissues, download the Hi-C data with:
```
python HiConformer_3DIV_crawler.py --threeDIVname {YOUR_TARGET_TISSUE} --savename {YOUR_DEFINED_TARGET_TISSUE_NAME}
```
Running HICCUPS to call peaks for training is also required.
ChromHMM data: For ChromHMM data, first find your target tissue's EID (Epigenome ID) here. Then download the ChromHMM data using:
```
python HiConformer_ChromHMM_crawler.py --roadmapEID {YOUR_EID}--savename {YOUR_DEFINED_TARGET_TISSUE_NAME}
```

Data Directory

Ensure your data is correctly organized after completing the preprocessing and downloading steps. The data directory should be structured as follows:

├── 3div                                                                                                                                          
│   └── IMR90_MboI                                                                                                                                
│       ├── IMR90_MboI_chr10_sanity                                                                                                               
│       ├── IMR90_MboI_chr11_sanity                                                                                                               
│       ├── IMR90_MboI_chr12_sanity                                                                                                               
│       ├── IMR90_MboI_chr13_sanity                                                                                                               
│       ├── IMR90_MboI_chr14_sanity                                                                                                               
│       ├── IMR90_MboI_chr15_sanity                                                                                                               
│       ├── IMR90_MboI_chr16_sanity                                                                                                               
│       ├── IMR90_MboI_chr17_sanity                                                                                                               
│       ├── IMR90_MboI_chr18_sanity                                                                                                               
│       ├── IMR90_MboI_chr19_sanity                                                                                                               
│       ├── IMR90_MboI_chr1_sanity                                                                                                                
│       ├── IMR90_MboI_chr20_sanity                                                                                                               
│       ├── IMR90_MboI_chr21_sanity                                                                                                               
│       ├── IMR90_MboI_chr22_sanity                                                                                                               
│       ├── IMR90_MboI_chr2_sanity                                                                                                                
│       ├── IMR90_MboI_chr3_sanity                                                                                                                
│       ├── IMR90_MboI_chr4_sanity

EpiVerse

Install / Use

README

EpiVerse

Visualization Browser

Installation

Avocado Environment

HiConformer training and MIRNet training Environment

Data processing environment

Avocado

HiConformer

Data Preparation

Custom Data preparation

Data Directory