IRIS: Isoform peptides from RNA splicing for Immunotherapy target Screening

Quick guide

Dependencies
Installation
Usage
Example
Output
Contact
Publication

Dependencies

Core dependencies (required for major IRIS functions/steps - format, screen, and predict)

python 2.7.x (numpy, scipy, seaborn, pyBigWig, statsmodels, pysam)
IEDB stand-alone (Note: IRIS is only tested on 20130222 2.15.5)
- IEDB additionally depends on:
  - tcsh
  - gawk
bedtools 2.29.0

Other dependencies (required for processing raw RNA-Seq and MS data)

STAR 2.5.3: required for IRIS RNA-seq processing
samtools 1.3: required for IRIS RNA-seq processing
rMATS-turbo: required for IRIS RNA-seq processing
Cufflinks 2.2.1: required for IRIS RNA-seq processing
seq2HLA: required for HLA typing (Note: The original URL of the tool is no longer working); requires bowtie
MS GF+ (v2018.07.17): required for MS search; requiring Java
R: used by seq2HLA

Installation

1. Download

1.1 Download IRIS program

The IRIS program can be downloaded directly from the repository, as shown below:

git clone https://github.com/Xinglab/IRIS.git
cd IRIS

IRIS is designed to make use of a computing cluster to improve performance. For users who want to enable cluster execution for functions that support it (see Configure for details), please update the contents of snakemake_profile/ to ensure compatibility with the available compute environment.

1.2 Download IRIS db

IRIS loads a big-data reference database of splicing events and other genomic annotations. These data are included in IRIS_data.v2.0.0 (size of entire folder is ~400 GB; users can select reference groups to download). The files need to be placed under ./IRIS_data/

1.3 Download IEDB MHC I prediction tools

Download IEDB_MHC_I-2.15.5.tar.gz from the IEDB website (see Dependencies). Create a folder named IEDB/ in the IRIS folder, then move the downloaded gz file to IEDB/. From http://tools.iedb.org/main/download/

click "MHC Class I"
click "previous version"
find and download version 2.15.5

The manual download is needed because there is a license that must be accepted.

2. Install

./install can automatically install most dependencies to conda environments:

conda must already be installed for the script to work
- https://docs.conda.io/en/latest/miniconda.html
The install script will check if IRIS_data/ has been downloaded
- To download see 1.2 Download IRIS db
The install script will check if IEDB tools has been downloaded
- To download see 1.3 Download IEDB MHC I prediction tools

Under the IRIS folder, to install IRIS core dependencies, do:

./install core

To install optional dependencies not needed for the most common IRIS usage:

./install all

3. Configure for compute cluster

Snakefile describes the IRIS pipeline. The configuration for running jobs can be set by editing snakemake_profile/. The provided configuration adapts IRIS to use Slurm. Other compute environments can be supported by updating this directory

snakemake_profile/config.yaml: Sets various Snakemake parameters including whether to submit jobs to a cluster.
snakemake_profile/cluster_submit.py: Script to submit jobs.
snakemake_profile/cluster_status.py: Script to check job status.
snakemake_profile/cluster_commands.py: Commands specific to the cluster management system being used. The default implementation is for Slurm. Other cluster environments can be used by changing this file. For example, snakemake_profile/cluster_commands_sge.py can be used to overwrite cluster_commands.py to support an SGE cluster.
To force Snakemake to execute on the local machine modify snakemake_profile/config.yaml:
- comment out cluster
- set jobs: {local cores to use}
- uncomment the resources section and set mem_mb: {MB of RAM to use}

4. Known issues

The conda install of Python 2 may give an error like ImportError: No module named _sysconfigdata_x86_64_conda_linux_gnu
- Check for the error by activating conda_env_2 and running python
- Resolve with commands similar to
  - cd conda_env_2/lib/python2.7/
  - cp _sysconfigdata_x86_64_conda_cos6_linux_gnu.py _sysconfigdata_x86_64_conda_linux_gnu.py
IRIS uses --label-string to determine which fastq files are for read 1 and read 2
- To avoid any issues name your fastq files so that they end with 1.fastq and 2.fastq to indicate which file represents which pair of the read

Usage

For streamlined AS-derived target discovery, please follow major functions and run the corresponding toy example.
For customized pipeline development, please check all functions of IRIS.

This flowchart shows how the IRIS functions are organized

iris_diagram

Individual functions

IRIS provides individual functions/steps, allowing users to build pipelines for their customized needs. IRIS_functions.md describes each model/step, including RNA-seq preprocessing, HLA typing, proteo-transcriptomic MS searching, visualization, etc.

usage: IRIS [-h] [--version]

positional arguments:
  {format,screen,predict,epitope_post,process_rnaseq,makesubsh_mapping,makesubsh_rmats,makesubsh_rmatspost,exp_matrix,makesubsh_extract_sjc,extract_sjc,sjc_matrix,index,translate,pep2epitope,screen_plot,screen_sjc,append_sjc,annotate_ijc,screen_cpm,append_cpm,screen_novelss,screen_sjc_plot,makesubsh_hla,parse_hla,ms_makedb,ms_search,ms_parse,visual_summary}
    format              Format AS matrices from rMATS, followed by indexing
                        for IRIS
    screen              Identify AS events of varying degrees of tumor
                        association and specificity using an AS reference
                        panel
    predict             Predict and annotate AS-derived TCR (pre-prediction)
                        and CAR-T targets
    epitope_post        Post-prediction step to summarize predicted TCR
                        targets
    process_rnaseq      Process RNA-Seq FASTQ files to quantify gene
                        expression and AS
    makesubsh_mapping   Make submission shell scripts for running
                        'process_rnaseq'
    makesubsh_rmats     Makes submission shell scripts for running rMATS-turbo
                        'prep' step
    makesubsh_rmatspost
                        Make submission shell scripts for running rMATS-turbo
                        'post' step
    exp_matrix          Make a merged gene expression matrix from multiple
                        cufflinks results
    makesubsh_extract_sjc
                        Make submission shell scripts for running
                        'extract_sjc'
    extract_sjc         Extract SJ counts from STAR-aligned BAM file and
                        annotates SJs with number of uniquely mapped reads
                        that support the splice junction.
    sjc_matrix          Make SJ count matrix by merging SJ count files from a
                        specified list of samples. Performs indexing of the
                        merged file
    index               Index AS matrices for IRIS
    translate           Translate AS junctions into junction peptides
    pep2epitope         Wrapper to run IEDB for peptide-HLA binding prediction
    screen_plot         Make stacked/individual violin plots for list of AS
                        events
    screen_sjc          Identify AS events of varying degrees of tumor
                        specificity by comparing the presense-absense of
                        splice junctions using a reference of SJ counts
    append_sjc          Append "screen_sjc" result as an annotation to PSI-
                        based screening results and epitope prediction results
                        in a specified screening output folder
    annotate_ijc        Annotate inclusion junction count info to PSI-based
                        screening results or epitope prediction results in a
                        specified screening output folder. Can be called from
                        append_sjc to save time
    screen_cpm          Identify AS events of varying degrees of tumor

IRIS

Install / Use

README