EQTpLot
Visualization of Colocalization Between eQTL and GWAS Data
Install / Use
/learn @RitchieLab/EQTpLotREADME
eQTpLot
Visualization of Colocalization Between eQTL and GWAS Data
eQTpLot is an intuitive and user-friendly R package developed for the visualization of colocalization between eQTL and GWAS data. eQTpLot takes as input standard GWAS and eQTL summary statistics, and optional pairwise LD information, to generate a series of plots visualizing colocalization, correlation, and enrichment between eQTL and GWAS signals for a given gene-trait pair. With eQTpLot, investigators can easily generate a series of customizable plots clearly illustrating, for a given gene-trait pair:
<ol> <li>colocalization between GWAS and eQTL signals</li> <li>correlation between GWAS and eQTL p-values</li> <li>enrichment of eQTLs among trait-significant variants</li> <li>the LD landscape of the locus in question</li> <li>the relationship between the direction of effect of eQTL signals and the direction of effect of colocalizing GWAS peaks</li> </ol>These clear and comprehensive plots provide a unique view of eQTL-GWAS colocalization, allowing for a more complete understanding of the interaction between gene expression and trait associations. eQTpLot was developed in R version 4.0.0 and depends on a number of packages for various aspects of its implementation
c("biomaRt", "dplyr", "GenomicRanges", "ggnewscale", "ggplot2", "ggplotify", "ggpubr", "gridExtra", "Gviz", "LDheatmap", "patchwork")
Table of Contents
<p> </p> <p> </p>Installation
eQTpLot can be install using devtools, either directly from GitHub,
devtools::install_github("RitchieLab/eQTpLot")
or by downloading the repository to your computer, unzipping, and installing the eQTpLot folder.
devtools::install("eQTpLot")
*Note: For issues installing dependencies, try running the following code prior to installation.
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS=TRUE)
Input files
At a minimum, eQTpLot requires two input data frames:
<ol> <li>GWAS summary statistics (compatible with PLINK --linear/logistic output format https://www.cog-genomics.org/plink/1.9/formats#assoc_linear) </li> <li>eQTL summary statistics, ex. downloaded directly from the GTEx portal https://gtexportal.org/home/</li> </ol>Two optional data frames may also be supplied:
<ol> <li>pairwise linkage disequilibrium (LD) data for the analyzed variants</li> <li>gene coordinates</li> </ol>The formatting parameters of all both required and both optional input files are summarized below.
GWAS.df
GWAS.df is a data frame of GWAS summary data with one row per SNP, ex. PLINK .assoc.linear, .assoc.logistic format, containing the following columns:
Column|Description
-----|:-----
CHR|Chromosome for SNP (sex chromosomes coded numerically). Data type: integer
POS|Chromosomal position for each SNP, in base pairs. Data type: integer
SNP|Variant ID (such as dbSNP ID "rs...". Note: Must be the same naming scheme as used in eQTL.df to ensure proper matching). Data type: character
P|p-value for the SNP from GWAS analysis. Data type: numeric
BETA|beta for the SNP from GWAS analysis. Data type: numeric
PHE|OPTIONAL Name of the phenotype for which the GWAS data refers, useful if your GWAS.df contains data for multiple phenotypes, i.e. PheWAS. If not provided, eQTpLot will assume the GWAS data is for a single phenotype, specified with the trait argument. Data type: character
> data(GWAS.df.example)
> head(GWAS.df.example)
CHR BP SNP P BETA PHE
1 11 66078129 rs1625595 0.06646 -7.925e-05 Creatinine
2 11 66078252 rs565374903 0.17350 -1.915e-02 Creatinine
3 11 66078296 rs750544051 0.03073 -4.299e-02 Creatinine
4 11 66078347 11:66078347_C_G 0.64030 -9.298e-03 Creatinine
5 11 66078368 rs541384459 0.93890 5.763e-04 Creatinine
6 11 66078385 rs138591375 0.34690 1.647e-03 Creatinine
<p> </p>
eQTL.df
eQTL.df is a data frame of eQTL data, one row per SNP, ex. downloaded directly from the GTEx Portal in .csv format, containing the following columns:
Column|Description
-----|:-----
SNP.Id|Variant ID Note: naming scheme must be the same as what is used in the GWAS.df to ensure proper matching. Data type: character
Gene.Symbol|Gene symbol to which the eQTL expression data refers Note: gene symbol must match entries in Genes.df to ensure proper matching. Data type: character
P.value|P-value for the SNP from eQTL analysis Data type: numeric
NES|Normalized effect size for the SNP from eQTL analysis (Per GTEx, defined as the slope of the linear regression, and is computed as the effect of the alternative allele relative to the reference allele in the human genome reference. Data type: numeric
Tissue|Tissue type to which the eQTL p-value/NES refer Note: eQTL.df can contain multiple tissue types. Data type: character
N|OPTIONAL Number of samples used to calculate the p-value and NES for the eQTL data, used if performing a MultiTissue or PanTissue analysis with the option CollapseMethod set to "meta" for a simple sample size weighted meta-analysis. Data type: character
> data(eQTL.df.example)
> head(eQTL.df.example)
Gene.Symbol SNP.Id P.Value NES Tissue
1 PELI3 rs138677235 0.0377103 -0.139874 Adipose_Subcutaneous
2 PELI3 rs111472085 0.0131649 0.257579 Adipose_Subcutaneous
3 PELI3 rs75325358 0.0442168 -0.147111 Adipose_Subcutaneous
4 PELI3 rs113298476 0.0442168 -0.147111 Adipose_Subcutaneous
5 PELI3 rs73490435 0.0134318 0.256645 Adipose_Subcutaneous
6 PELI3 rs112219657 0.0387010 0.214056 Adipose_Subcutaneous
<p> </p>
Genes.df
Genes.df is an optional data frame, one row per gene, which should contain the following columns:
Note: eQTpLot automatically loads a default Genes.df containing information for most protein-coding genes for genome builds hg19 and hg38, but you may wish to specify our own Genes.df data frame if your gene of interest is not included in the default data frame, or if your eQTL data uses a different gene naming scheme (for example, Gencode ID instead of gene symbol)
Column|Description
-----|:-----
Gene|Gene symbol/name for which the Coordinate data refers to Note: gene symbol/name must match entries in eQTL.df to ensure proper matching. Data type: character
CHR|Chromosome the gene is on Note: do not include a "car" prefix, and sex chromosomes should be coded numerically. Data type: integer
Start|Chromosomal coordinate of start position (in basepairs) to use for gene Note: this should be the smaller of the two values between Start and Stop. Data type: integer
Stop|Chromosomal coordinate of end position (in basepairs) to use for gene Note: this should be the larger of the two values between Start and Stop. Data type: integer
Build|The genome build for the coordinate data -- the default Genes.df dataframe contains entries for both genome builds for each gene, and the script will select the appropriate entry based on the specified gbuild (default is "hg19")). Data type: character, c("hg19", "hg38")
> data(Genes.df.example)
> head(Genes.df.example)
CHR Start Stop Gene Build
1 19 58858171 58864865 A1BG hg19
2 10 52559168 52645435 A1CF hg19
3 12 9220303 9268825 A2M hg19
4 12 8975067 9039798 A2ML1 hg19
5 1 33772366 33786699 A3GALT2 hg19
6 22 43088117 43117307 A4GALT hg19
<p> </p>
LD.df
LD.df is an optional data frame of SNP linkage data, one row per SNP pair, compatible with PLINK .ld (--r/--r2) file format https://www.cog-genomics.org/plink/1.9/formats#ld
Note: If no LD.df is supplied, eQTpLot will plot data without LD information
Column|Description
-----|:-----
BP_A|Base pair position of the first variant in the LD pair. Data type: integer
SNP_A|Variant ID of the first variant in the LD pair. Data type: character
BP_B|Base pair position of the second variant in the LD pair. Data type:integer
SNP_B|Variant ID of the second variant in the LD pair. Data type: character
R2|Squared correlation measure of linkage between the two variants. Data type: numeric
> data(LD.df.example)
> head(LD.df.example)
CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2
1 11 66078129 rs1625595 11 66079275 11:66079275_GA_G 0.299550
2 11 66078129 rs1625595 11 66079361 rs33981819 0.686453
3 11 66078129 rs1625595 11 66079786 rs490972 0.991748
4 11 66078129 rs1625595 11 66079787 rs565972 0.991756
5 11 66078129 rs1625595 11 66079818 rs61891388 0.706614
6 11 66078129 rs1625595 11 66080770 rs7924580 0.309860
*Note: variants in SNP_A
