eQTpLot

Visualization of Colocalization Between eQTL and GWAS Data

eQTpLot is an intuitive and user-friendly R package developed for the visualization of colocalization between eQTL and GWAS data. eQTpLot takes as input standard GWAS and eQTL summary statistics, and optional pairwise LD information, to generate a series of plots visualizing colocalization, correlation, and enrichment between eQTL and GWAS signals for a given gene-trait pair. With eQTpLot, investigators can easily generate a series of customizable plots clearly illustrating, for a given gene-trait pair:

<ol> <li>colocalization between GWAS and eQTL signals</li> <li>correlation between GWAS and eQTL p-values</li> <li>enrichment of eQTLs among trait-significant variants</li> <li>the LD landscape of the locus in question</li> <li>the relationship between the direction of effect of eQTL signals and the direction of effect of colocalizing GWAS peaks</li> </ol>

These clear and comprehensive plots provide a unique view of eQTL-GWAS colocalization, allowing for a more complete understanding of the interaction between gene expression and trait associations. eQTpLot was developed in R version 4.0.0 and depends on a number of packages for various aspects of its implementation

c("biomaRt", "dplyr", "GenomicRanges", "ggnewscale", "ggplot2", "ggplotify", "ggpubr", "gridExtra", "Gviz", "LDheatmap", "patchwork")

Installation
Input files
- GWAS.df
- eQTL.df
- Genes.df
- LD.df
Function arguments
Notes on Analysis
- Congruence/Incongruence
- PanTissue and MultiTissue Analysis
Generation of Each Panel
Use Examples
- Example 1 – comparing eQTpLots for two genes within a linkage peak
  - Figure 1
  - Figure 2
- Example 2 –The TissueList function and adding LD information to eQTpLot
  - Figure 3
- Example 3 – Separating Congruous from Incongruous Variants]
  - Figure 4

Installation

eQTpLot can be install using devtools, either directly from GitHub,

devtools::install_github("RitchieLab/eQTpLot")

or by downloading the repository to your computer, unzipping, and installing the eQTpLot folder.

devtools::install("eQTpLot")

*Note: For issues installing dependencies, try running the following code prior to installation.

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS=TRUE)

Input files

At a minimum, eQTpLot requires two input data frames:

<ol> <li>GWAS summary statistics (compatible with PLINK --linear/logistic output format https://www.cog-genomics.org/plink/1.9/formats#assoc_linear) </li> <li>eQTL summary statistics, ex. downloaded directly from the GTEx portal https://gtexportal.org/home/</li> </ol>

Two optional data frames may also be supplied:

<ol> <li>pairwise linkage disequilibrium (LD) data for the analyzed variants</li> <li>gene coordinates</li> </ol>

The formatting parameters of all both required and both optional input files are summarized below.

GWAS.df

GWAS.df is a data frame of GWAS summary data with one row per SNP, ex. PLINK .assoc.linear, .assoc.logistic format, containing the following columns:

Column|Description -----|:----- CHR|Chromosome for SNP (sex chromosomes coded numerically). Data type: integer POS|Chromosomal position for each SNP, in base pairs. Data type: integer SNP|Variant ID (such as dbSNP ID "rs...". Note: Must be the same naming scheme as used in eQTL.df to ensure proper matching). Data type: character
P|p-value for the SNP from GWAS analysis. Data type: numeric BETA|beta for the SNP from GWAS analysis. Data type: numeric PHE|OPTIONAL Name of the phenotype for which the GWAS data refers, useful if your GWAS.df contains data for multiple phenotypes, i.e. PheWAS. If not provided, eQTpLot will assume the GWAS data is for a single phenotype, specified with the trait argument. Data type: character

> data(GWAS.df.example)
> head(GWAS.df.example)
  CHR       BP             SNP       P       BETA        PHE
1  11 66078129       rs1625595 0.06646 -7.925e-05 Creatinine
2  11 66078252     rs565374903 0.17350 -1.915e-02 Creatinine
3  11 66078296     rs750544051 0.03073 -4.299e-02 Creatinine
4  11 66078347 11:66078347_C_G 0.64030 -9.298e-03 Creatinine
5  11 66078368     rs541384459 0.93890  5.763e-04 Creatinine
6  11 66078385     rs138591375 0.34690  1.647e-03 Creatinine

eQTL.df

eQTL.df is a data frame of eQTL data, one row per SNP, ex. downloaded directly from the GTEx Portal in .csv format, containing the following columns:

Column|Description -----|:----- SNP.Id|Variant ID Note: naming scheme must be the same as what is used in the GWAS.df to ensure proper matching. Data type: character Gene.Symbol|Gene symbol to which the eQTL expression data refers Note: gene symbol must match entries in Genes.df to ensure proper matching. Data type: character P.value|P-value for the SNP from eQTL analysis Data type: numeric
NES|Normalized effect size for the SNP from eQTL analysis (Per GTEx, defined as the slope of the linear regression, and is computed as the effect of the alternative allele relative to the reference allele in the human genome reference. Data type: numeric Tissue|Tissue type to which the eQTL p-value/NES refer Note: eQTL.df can contain multiple tissue types. Data type: character N|OPTIONAL Number of samples used to calculate the p-value and NES for the eQTL data, used if performing a MultiTissue or PanTissue analysis with the option CollapseMethod set to "meta" for a simple sample size weighted meta-analysis. Data type: character

> data(eQTL.df.example)
> head(eQTL.df.example)
  Gene.Symbol      SNP.Id   P.Value       NES               Tissue
1       PELI3 rs138677235 0.0377103 -0.139874 Adipose_Subcutaneous
2       PELI3 rs111472085 0.0131649  0.257579 Adipose_Subcutaneous
3       PELI3  rs75325358 0.0442168 -0.147111 Adipose_Subcutaneous
4       PELI3 rs113298476 0.0442168 -0.147111 Adipose_Subcutaneous
5       PELI3  rs73490435 0.0134318  0.256645 Adipose_Subcutaneous
6       PELI3 rs112219657 0.0387010  0.214056 Adipose_Subcutaneous

Genes.df

Genes.df is an optional data frame, one row per gene, which should contain the following columns:

Note: eQTpLot automatically loads a default Genes.df containing information for most protein-coding genes for genome builds hg19 and hg38, but you may wish to specify our own Genes.df data frame if your gene of interest is not included in the default data frame, or if your eQTL data uses a different gene naming scheme (for example, Gencode ID instead of gene symbol)

Column|Description -----|:----- Gene|Gene symbol/name for which the Coordinate data refers to Note: gene symbol/name must match entries in eQTL.df to ensure proper matching. Data type: character CHR|Chromosome the gene is on Note: do not include a "car" prefix, and sex chromosomes should be coded numerically. Data type: integer Start|Chromosomal coordinate of start position (in basepairs) to use for gene Note: this should be the smaller of the two values between Start and Stop. Data type: integer Stop|Chromosomal coordinate of end position (in basepairs) to use for gene Note: this should be the larger of the two values between Start and Stop. Data type: integer
Build|The genome build for the coordinate data -- the default Genes.df dataframe contains entries for both genome builds for each gene, and the script will select the appropriate entry based on the specified gbuild (default is "hg19")). Data type: character, c("hg19", "hg38")

> data(Genes.df.example)
> head(Genes.df.example)
  CHR    Start     Stop    Gene Build
1  19 58858171 58864865    A1BG  hg19
2  10 52559168 52645435    A1CF  hg19
3  12  9220303  9268825     A2M  hg19
4  12  8975067  9039798   A2ML1  hg19
5   1 33772366 33786699 A3GALT2  hg19
6  22 43088117 43117307  A4GALT  hg19

LD.df

LD.df is an optional data frame of SNP linkage data, one row per SNP pair, compatible with PLINK .ld (--r/--r2) file format https://www.cog-genomics.org/plink/1.9/formats#ld

Note: If no LD.df is supplied, eQTpLot will plot data without LD information

Column|Description -----|:----- BP_A|Base pair position of the first variant in the LD pair. Data type: integer SNP_A|Variant ID of the first variant in the LD pair. Data type: character BP_B|Base pair position of the second variant in the LD pair. Data type:integer SNP_B|Variant ID of the second variant in the LD pair. Data type: character R2|Squared correlation measure of linkage between the two variants. Data type: numeric

> data(LD.df.example)
> head(LD.df.example)
  CHR_A     BP_A     SNP_A CHR_B     BP_B            SNP_B       R2
1    11 66078129 rs1625595    11 66079275 11:66079275_GA_G 0.299550
2    11 66078129 rs1625595    11 66079361       rs33981819 0.686453
3    11 66078129 rs1625595    11 66079786         rs490972 0.991748
4    11 66078129 rs1625595    11 66079787         rs565972 0.991756
5    11 66078129 rs1625595    11 66079818       rs61891388 0.706614
6    11 66078129 rs1625595    11 66080770        rs7924580 0.309860

*Note: variants in SNP_A

EQTpLot

Install / Use

README