EpiVerse
EpiVerse is an innovative three-stage pipeline designed to impute chromatin structures across various cell types. By enabling cross cell-type Hi-C imputations, EpiVerse facilitates unique "epigenome-level" perturbational Hi-C experiments, offering valuable insights into chromatin architecture dynamics under diverse epigenetic conditions.
Install / Use
/learn @jhhung/EpiVerseREADME
EpiVerse
Welcome to the official repository of EpiVerse, an innovative three-stage pipeline chromatin strucuture imputation tool. EpiVerse specializes in high-quality, cross cell-type Hi-C imputations, facilitating unique "epigenome-level" perturbational Hi-C experiments. It offers valuable insights into chromatin architecture dynamics across various epigenetic conditions.

Visualization Browser
Explore imputed Hi-C data effortlessly with our dedicated browser, designed for 39 tissues. This intuitive tool simplifies data interpretation, enhancing your research experience.
Installation
EpiVerse comprises three core modules: Avocado, HiConformer, and MIRNet. Due to conda environment conflicts, we recommend installing EpiVerse across three separate conda environments.
Avocado Environment
Set up the Avocado environment using:
conda create -n Avocado
conda activate Avocado
pip install avocado-epigenome
HiConformer training and MIRNet training Environment
For HiConformer and MIRNet, execute these commands:
conda env create -f environments/HiConformer.yml
Data processing environment
We utilize fanc to convert the file format and process.
conda env create -f environments/fanc.yml
Avocado
We leverage the Avocado model for new cell type imputations, following its prescribed methodology.
- Download the pretrained Avocado model weights: Begin by downloading the Avocado pretrained model weights. Use the ENCODE-Core hg38 Version available at Zenodo
- Fit New Celltypes: Utilize Avocado's fitting function with your data (comprising available tracks) to fit new cell types. This is done using the
model.fit_celltypes(data, n_epochs=50)command, where data represents your available epigenomic tracks. - Predict Missing Tracks: Run the prediction function for your specific cell type and the missing tracks using model.predict("{YOUR_CELLTYPE}", "{MISSING_TRACKS}").
- Integrating Imputations with Available Tracks: Once you have predicted the missing tracks using Avocado, the next step is to integrate these imputed tracks with your existing data. This combined dataset will serve as the input for HiConformer.
Note1: If the tissues is inside the Avocado imputations, we recommend to Download the Avocado imputations from ENCODE.
You can see the HiConformer Data Preparation section.
Note2: For further details, please refer to the Avocado Github repository.
HiConformer
HiConformer, the core module of EpiVerse, maps imputed epigenetic signals from Avocado and one-hot encoded DNA sequences to Hi-C diagonals.
Data Preparation
HiConformer requires DNA sequences, epigenetic signals, Hi-C, and ChromHMM data for training. For inference purposes only, you need DNA sequences, epigenetic signals, and pre-trained model weights. Below is a guide for preparing data for the IMR-90 cell line as an example.
Note1: To only perform inference with HiConformer, download the DNA sequence, epigenetic signals, and pre-trained model weights.
Note2: Please cd in to the pipelines folder to run each script with conda activate HiConformer
-
HiConformer Pretrained Model Weights: Download the pretrained model weights and configuration from Zenodo.
-
DNA sequence: Retrieve the hg38 reference DNA sequence from UCSC using:
python HiConformer_ref_crawler.py -
Epigenetic Signals: Refer to the Avocado section for generating custom imputed epigenetic signals. Example for pre-imputed IMR-90 MboI:
python Avocado_preimpute_crawler.py --avoname IMR-90 --savename IMR90_MboI -
Hi-C data: Data is sourced from from 3DIV. To download Hi-C data for IMR90-MboI, run:
python HiConformer_3DIV_crawler.py --threeDIVname IMR90_MboI --savename IMR90_MboIWe also use the peaks called from HICCUPS to sample better diagonals for training time. It's not required if you only need to inference HiConformer. It should have the following format.
chr1 65585000 65590000 chr1 65700000 65705000 . 9.06 . . 5.25 4.65e-05 0.0277 6.84 6.67e-06 0.00276 chr1 90240000 90245000 chr1 90665000 90670000 . 9.52 . . 4.99 4.65e-05 0.0277 7.12 6.67e-06 0.00276 chr1 72650000 72655000 chr1 72700000 72705000 . 11.5 . . 5.18 1.36e-05 0.0156 4.32 0.00012 0.0302 chr1 178855000 178860000 chr1 178930000 178935000 . 21.1 . . 3.72 9.77e-07 0.00372 3.61 9.77e-07 0.00246 chr1 203055000 203060000 chr1 203370000 203375000 . 8.49 . . 5.38 4.27e-05 0.0247 11.9 1.13e-06 0.0025 chr1 189495000 189500000 chr1 191160000 191165000 . 9.42 . . 5.2 4.65e-05 0.0404 5.84 4.65e-05 0.0134 chr1 189500000 189505000 chr1 190455000 190460000 . 9.24 . . 4.81 4.65e-05 0.0337 4.79 4.65e-05 0.0157 chr1 88725000 88730000 chr1 89660000 89665000 . 12.3 . . 7.84 1.5e-08 0.000117 5.68 2.59e-06 0.00201 ... -
ChromHMM data: ChromHMM data is obtained and processed from Roadmap epigenomics. To download ChromHMM data for IMR90-MboI:
python HiConformer_ChromHMM_crawler.py --roadmapEID E017 --savename IMR90_MboI
Custom Data preparation
Users can generate custom data inputs for HiConformer. This section outlines the necessary steps.
Note1: If you only intend to use HiConformer for inference, you need only the DNA sequence, epigenetic signals, and pretrained model weights.
Note2: Please cd in to the pipelines folder to run each script with conda activate HiConformer
- DNA Sequence: This step is identical to the one in the Data Preparation section. Obtain the hg38 reference from UCSC using:
python HiConformer_ref_crawler.py - Epigenetic signals: Refer to the Avocado section for generating custom imputed epigenetic signals. To use other Avocado-available tissues, check the metadata inside Avocado folder for your target tissue. Use the following command:
python Avocado_preimpute_crawler.py --avoname {YOUR_TARGET_TISSUE} --savename {YOUR_DEFINED_TARGET_TISSUE_NAME} - Hi-C data: Retrieve and generate data from 3DIV. After identifying your target tissue from the 3DIV available tissues, download the Hi-C data with:
Running HICCUPS to call peaks for training is also required.python HiConformer_3DIV_crawler.py --threeDIVname {YOUR_TARGET_TISSUE} --savename {YOUR_DEFINED_TARGET_TISSUE_NAME} - ChromHMM data: For ChromHMM data, first find your target tissue's EID (Epigenome ID) here. Then download the ChromHMM data using:
python HiConformer_ChromHMM_crawler.py --roadmapEID {YOUR_EID}--savename {YOUR_DEFINED_TARGET_TISSUE_NAME}
Data Directory
Ensure your data is correctly organized after completing the preprocessing and downloading steps. The data directory should be structured as follows:
├── 3div
│ └── IMR90_MboI
│ ├── IMR90_MboI_chr10_sanity
│ ├── IMR90_MboI_chr11_sanity
│ ├── IMR90_MboI_chr12_sanity
│ ├── IMR90_MboI_chr13_sanity
│ ├── IMR90_MboI_chr14_sanity
│ ├── IMR90_MboI_chr15_sanity
│ ├── IMR90_MboI_chr16_sanity
│ ├── IMR90_MboI_chr17_sanity
│ ├── IMR90_MboI_chr18_sanity
│ ├── IMR90_MboI_chr19_sanity
│ ├── IMR90_MboI_chr1_sanity
│ ├── IMR90_MboI_chr20_sanity
│ ├── IMR90_MboI_chr21_sanity
│ ├── IMR90_MboI_chr22_sanity
│ ├── IMR90_MboI_chr2_sanity
│ ├── IMR90_MboI_chr3_sanity
│ ├── IMR90_MboI_chr4_sanity
