<h2 align="center"> Information-theoretic navigation of multi-tissue functional genomic annotations </h2>

Epilogos is an approach for analyzing, visualizing, and navigating multi-biosample functional genomic annotations, with an emphasis on chromatin state maps generated with e.g. ChromHMM or Segway.

The software provided in this repository implements the methods underlying Epilogos using Python 3.9. We provide a proof-of-principle dataset based on chromatin state calls from the EpiMap dataset (<a href="https://www.nature.com/articles/s41586-020-03145-z">Boix et al., Nature 2021</a>).

<p align="center"> Created by: Wouter Meuleman, Jacob Quon, Alex Reynolds, and Eric Rynes </p>

<div align="center"><a name="menu"></a> <h3> <a href="#installation">Installation</a> • <a href="#prerequisites">Prerequisites</a> • <a href="#running-epilogos">Running Epilogos</a> • <a href="#slurm-examples">SLURM Examples</a> • <a href="#non-slurm-examples">Non-SLURM Examples</a> • <a href="#command-line-options">Command Line Options</a> • <a href="#plot-region">Plot Region</a> • <a href="#pairwise-epilogos">Pairwise Epilogos</a> • <a href="#similarity-search">Similarity Search</a> </h3> </div>

<br>

Installation

Although not required, it is good practice to create a virtual environment in which specific versions of Python and its libraries are installed. This can be done using conda, for instance as such:

$ conda init bash  ## only needed upon first use of conda. Restart shell after this.
$ conda create -n epilogos python=3.9
$ conda activate epilogos

To install Epilogos simply run the following two commands

$ pip install epilogos

Alternatively, install Epilogos directly from this Git repository using

$ pip install git+https://github.com/meuleman/epilogos

Prerequisites

To compute epilogos, you will need to have the following python libraries installed: statsmodels, click, numpy, scipy, matplotlib, pandas, pysam, scikit-learn, natsort, pyranges, and rich. In case the abovementioned commands not automatically and correctly take care of this, the libraries can be installed with one of the following commands.

$ pip install 'click==8.1.3' 'numpy==1.23.4' 'pandas==1.5.1' 'scipy==1.9.3' 'matplotlib==3.6.1' 'statsmodels==0.13.2' 'scikit-learn==1.1.3' 'pysam==0.19.1' 'natsort==8.2.0' 'pyranges==0.0.117' 'rich==12.6.0'

or while in the epilogos directory

$ pip install -r requirements.txt

Additionally, it is recommended that python is updated to version 3.9 or later. We cannot guarantee the validity of results generated by earlier verions of python.

Running Epilogos

To be presented with basic documentation of arguments needed to run epilogos, simply run the command epilogos --help or python -m epilogos --help (More in-depth explanation is given below).

By default, Epilogos assumes access to a computational cluster managed by SLURM. A version of epilogos has been created for those without access to a SLURM cluster and can be run by using the -l flag to your command (e.g. epilogos -l).

SLURM Examples

<details><summary><b> Minimal example on provided example data</b></summary> <p></p> <p>If you cloned this git repository, example data has been provided under <code>data/pyData/male/</code>. Otherwise it is available for download using the script in <code>bin/download_example_data.sh</code>. The script uses <a href="https://curl.se/">cURL</a> to download neccessary files and places them in a file hierarchy generated within the current directory. The file, <code>epilogos_matrix_chr1.txt.gz</code>, contains chromatin state calls for a 18-state chromatin model, across 200bp genomic bins spanning human chromosome 1. The data was pulled from the <a href="https://docs.google.com/spreadsheets/d/103XbiwChp9sJhUXDJr9ztYEPL00_MqvJgYPG-KZ7WME/edit#gid=1813267486">EpiMap dataset</a> and contains only those epigenomes which are tagged <code>Male</code> under the <code>Sex</code> column.</p> <p>To compute epilogos (using the S1 saliency metric) for this sample data run following command within the <code>epilogos/</code> directory (replacing <code>OUTPUTDIR</code> with the output directory of your choice).</p>

$ epilogos -i data/pyData/male/ -j data/state_metadata/human/Boix_et_al_833_sample/hg19/18/metadata.tsv -o OUTPUTDIR

<p>Upon completion of the run, you should see the files <code>scores_male_s1_epilogos_matrix_chr1.txt.gz</code> and <code>regionsOfInterest_male_s1.txt</code> in <code>OUTPUTDIR</code></p> <p>To customize your run of epilogos see the <a href="#command-line-options">Command Line Options</a> of the <code>README</code></p> </details> <details><summary><b> Running Epilogos with your own data</b></summary> <p></p> <p>Before you can run Epilogos on your own data, you will need to complete two steps.</p> <p>First, you will need to format your data such that Epilogos can parse it. To assist with this, we have provided a bash script which takes ChromHMM files and generates Epilogos input files. This can be found at <code>scripts/preprocess_data_ChromHMM.sh</code> (to get usage information, run without arguments). If you would prefer not to use the script, data is to be formatted as follows:</p>

Column 1: Chromosome name
Column 2: Start coordinate
Column 3: End coordinate
Column 4: State data for epigenome 1
...
Column n: State data for epigenome n-3

<p>Second, you will need to create a state info file. This is a tab separated file containing various information about each of the states in the chromatin state model. We have provided some files already for common models in the <code>data/state_metadata/</code> directory. For more information on the structure of these files see <code>data/state_metadata/README.txt</code> or <a href="#state-info">State Info [-j, --state-info]</a></p> <p>Once you have completed these two steps, you can run epilogos with the following command:</p>

$ epilogos -i PATH/TO/INPUT_DIR -j PATH/TO/STATE_INFO_TSV -o PATH/TO/OUTPUT_DIR

<p>Upon completion of the run, you should see the same number of scores files as in your input directory in <code>OUTPUT_DIR</code>. Each of these files will be named <code>scores_*.txt.gz</code>, where 'scores_' is followed by the input directory name, the saliency metric, and the corresponding input file name (extensions removed). Additionally, you will find a <code>regionsOfInterest_*.txt</code> file which follows the same naming convention minus the input file name.</p> <p>To further customize your run of epilogos see the <a href="#command-line-options-pairwise">Command Line Options</a> of the <code>README</code></p>

Note

Ensure that your working INPUT_DIR contains only those files needed for scoring.

Visualization

<p>If you would like to visualize these results as seen on <a href="https://epilogos.altius.org">epilogos.altius.org</a>, conversion to the multivec format can be performed with the HiGlass-based [clodius](https://github.com/higlass/clodius) toolkit. Options for visualizing multivec-formatted files include the [epilogos-web](https://github.com/meuleman/epilogos-web?tab=readme-ov-file#dataset-overview) web front-end and the [resgen.io](https://resgen.io) site.</p> </details>

Non-SLURM Examples

$ epilogos -l -i data/pyData/male/ -j data/state_metadata/human/Boix_et_al_833_sample/hg19/18/metadata.tsv -o OUTPUTDIR

<p>Upon completion of the run, you should see the file <code>scores_male_s1_epilogos_matrix_chr1.txt.gz</code> and <code>regionsOfInterest_male_s1.txt</code> in <code>OUTPUTDIR</code></p> <p>To customize your run of epilogos see the <a href="#command-line-options">Command Line

Epilogos

Install / Use

README