MultiABEL

Multi-Trait Genome-Wide Association Analysis

Installation

Run the following command in R to install the MultiABEL package from CRAN (stable but not latest!):

install.packages("MultiABEL")

To install the developer version from R-Forge:

install.packages("MultiABEL", repos="http://R-Forge.R-project.org")

To install the latest developer version from GitHub:

require(devtools)
install_github("xiashen/MultiABEL")

As only CRAN compiles for macOS platform, so for the developer versions, both Linux and Mac users need to have gfortran compiler set up.

MultiABEL can be loaded in R via:

library(MultiABEL)

require(MultiABEL)

Multi-Trait GWAS using Summary Association Statistics

Example Summary Statistics

MultiABEL allows convenient and fast GWAS of multiple phenotypes directly from summary association statistics, i.e. genome-wide (or sufficiently large subset of) association results containing estimated genetic effects, standard errors, and reference alleles information. Here, we directly provide an example, and the data can be obtained from: https://www.dropbox.com/sh/2xftha9wcanobo4/AAD6ygCMyUv_gpDtIwRtw-Mta?dl=0

Each file of single-trait GWAS summary statistics should contain columns for variant names (default column name snp), the first (coding or reference) alleles (default column name a1), allele frequencies (default column name freq), effect sizes (default column name beta), standard errors (default column name se), and sample sizes (default column name n). For example, here the top of the summary statistics file for height looks like:

height <- read.table('height.txt', header = TRUE)
head(height)

##          snp a1 a2   freq    beta     se     p       n
## 1  rs4747841  a  g 0.5500  0.0025 0.0061 0.680 60558.2
## 2  rs4749917  t  c 0.4500 -0.0025 0.0061 0.680 60558.1
## 3   rs737656  a  g 0.3667 -0.0073 0.0064 0.250 60529.2
## 4   rs737657  a  g 0.3583 -0.0075 0.0064 0.240 60527.3
## 5 rs17524355  t  c     NA -0.0460 0.0460 0.320  1700.0
## 6  rs7086391  t  c 0.2000 -0.0130 0.0079 0.099 59675.0

where the columns p and a2 are extra information. For default simple pleiotropic analysis (MultiSummary(..., type = 'direct'), see below), n is not essentially required, if unknown, simply give a large number, e.g. 10,000. For MultiSummary(..., type = 'precise'), freq can be any values between 0 and 1, as the exact genotypic variances are given.

MultiABEL does NOT require different single-trait GWAS having been performed in exactly the same individuals.

Loading multiple GWAS summary statistics

Prior to loading the summary association statistics, you need names of a set of independent SNPs. These SNPs will be used for estimating the phenotypic correlation between phenotypes, accounting for partial sample overlap. The number of such SNPs are not important, as long as it's large enough, e.g. thousands. However, LD-pruning might be important. We provide a set of LD-pruned SNPs that can be used for any set of European-ancestry GWAS, can be loaded as:

load('indep.snps.RData')

However, for your specific GWAS, these SNPs might not be always available. An alternative is to replace these SNPs with the SNPs that have relatively low minor allele frequencies (MAF) in your GWAS -- they allow good estimation of the phenotypic correlations (see https://www.biorxiv.org/content/10.1101/2020.12.10.419325v2). Thereafter, the summary statistics data can be loaded as:

data <- load.summary(c('height.txt', 'weight.txt', 'bmi.txt'), indep.snps = indep.snps)

## loading data ...
## Progress: 100%
## checking markers ...
## Progress: 100%
## cleaning data ...
## Progress: 100%
## correcting parameters ...
## Progress: 100%
## adjusting sample size ... done.
## finalizing summary statistics ...
## Progress: 100%
## samples partially overlap!
## estimating shrinkage phenotypic correlations ... done.

The first command reads a set of independent SNPs for correlation estimation. If you have your own set of independent markers to import, simply replace. The loaded data contains three sub-objects: $gwa, $cor.pheno and $var.pheno, where $gwa is a cleaned data frame of single-trait GWAS results, and the rest are shrinkage phenotypic correlation matrix and phenotypic variances, both estimated or given by user input.

head(data$gwa)

##           height.txt.beta height.txt.se weight.txt.beta weight.txt.se
## rs4747841          0.0025        0.0061          0.0003        0.0064
## rs4749917         -0.0025        0.0061         -0.0003        0.0064
## rs737656          -0.0073        0.0064         -0.0063        0.0066
## rs737657          -0.0075        0.0064         -0.0056        0.0066
## rs7086391         -0.0130        0.0079          0.0094        0.0083
## rs878177           0.0140        0.0066          0.0008        0.0069
##           bmi.txt.beta bmi.txt.se      f       n
## rs4747841       0.0005     0.0063 0.5500 58322.7
## rs4749917      -0.0005     0.0063 0.4500 58322.6
## rs737656       -0.0025     0.0066 0.3667 58322.7
## rs737657       -0.0020     0.0066 0.3583 58316.8
## rs7086391       0.0230     0.0082 0.2000 58322.5
## rs878177       -0.0100     0.0068 0.3000 58322.4

data$cor.pheno

##             height.txt weight.txt     bmi.txt
## height.txt  1.00000000 0.31558706 -0.16242127
## weight.txt  0.31558706 1.00000000  0.85458914
## bmi.txt    -0.16242127 0.85458914  1.00000000

data$var.pheno

## [1] 1 1 1

Default load.summary(..., est.var = FALSE) assumes all phenotypic variances are 1, which is a known value for GWAS with inverse-Gaussian transformation of the phenotypes. Setting est.var = TRUE will estimate the phenotypic variances using summary statistics, which is useful e.g. for case-control studies where the variance of liability can be estimated.

A Simple Pleiotropic Meta-Analysis

Once the data are successfully loaded, the simplest multi-trait pleiotropic analysis is straightforward:

result <- MultiSummary(data)

## Multi-trait genome scan ... Done.

The result is a list with two sub-objects $scan and $coef. For this simple analysis, only the data frame $scan is reported, where the column p gives the multi-trait analysis p-values.

head(result$scan)

##              marker   freq       n             p
## rs4747841 rs4747841 0.5500 58322.7 0.79726537769
## rs4749917 rs4749917 0.4500 58322.6 0.79726537771
## rs737656   rs737656 0.3667 58322.7 0.63421006065
## rs737657   rs737657 0.3583 58316.8 0.65262659863
## rs7086391 rs7086391 0.2000 58322.5 0.00042921138
## rs878177   rs878177 0.3000 58322.4 0.01821862433

The result is analog to the MANOVA analysis in R, such as manova(cbind(Trait1, Trait2, Trait3) ~ SNP).

Pleiotropic & Conditional Meta-Analysis in HWE Outbred Population

If we assume that our outbred populations for the three phenotypes follow Hardy-Weinberg equilibrium (HWE), we can then perform deeper pleiotropic meta-analysis, with more estimates including conditional genetic effects. For example:

result.out <- MultiSummary(data, type = 'outbred')

## Multi-trait genome scan ... Done.

The result is a list with two sub-objects $scan and $coef. Now, the data frame $scan is reported with more columns, corresponding to three conditional GWAS analyses, i.e. single-trait GWAS for each trait with the other traits included as covariates:

head(result.out$scan)

##              marker   freq       n             p    beta.score
## rs4747841 rs4747841 0.5500 58322.7 0.79726537769 1.8754020e-05
## rs4749917 rs4749917 0.4500 58322.6 0.79726537771 1.8754020e-05
## rs737656   rs737656 0.3667 58322.7 0.63421006065 3.3824150e-05
## rs737657   rs737657 0.3583 58316.8 0.65262659863 3.1213268e-05
## rs7086391 rs7086391 0.2000 58322.5 0.00042921138 3.9293148e-04
## rs878177   rs878177 0.3000 58322.4 0.01821862433 2.0098628e-04
##                se.score beta.cond.height.txt se.cond.height.txt
## rs4747841 7.3004707e-05        0.00279569491       0.0026842035
## rs4749917 7.3004707e-05       -0.00279569491       0.0026842058
## rs737656  7.1087581e-05       -0.00069077805       0.0027711098
## rs737657  6.9344900e-05       -0.00126907245       0.0027851258
## rs7086391 1.1158264e-04        0.00801804470       0.0033387984
## rs878177  8.5122393e-05       -0.00337208905       0.0029142862
##           p.cond.height.txt beta.cond.weight.txt se.cond.weight.txt
## rs4747841       0.297626798       -0.00133195655       0.0014126825
## rs4749917       0.297627212        0.00133195655       0.0014126838
## rs737656        0.803145371       -0.00056711104       0.0014584137
## rs737657        0.648633972       -0.00023895596       0.0014657971
## rs7086391       0.016329074       -0.00593304553       0.0017570168
## rs878177        0.247235918        0.00357046716       0.0015336614
##           p.cond.weight.txt beta.cond.bmi.txt se.cond.bmi.txt
## rs4747841     0.34575444223     1.3979808e-03    0.0014689944
## rs4749917     0.34575485575    -1.3979808e-03    0.0014689957
## rs737656      0.69738363269     3.3470484e-04    0.0015165516
## rs737657      0.87050176200     3.4480639e-05    0.0015242273
## rs7086391     0.00073341764     7.3051451e-03    0.0018269113
## rs878177      0.01990852313    -4.0863748e-03    0.0015947648
##           p.cond.bmi.txt
## rs4747841  3.4127027e-01
## rs4749917  3.4127068e-01
## rs737656   8.2532506e-01
## rs737657   9.8195202e-01
## rs7086391  6.3709785e-05
## rs878177   1.0396101e-02

The p column is analog to the MANOVA analysis in R, such as `manova(cbind(T

MultiABEL

Install / Use

README

MultiABEL

Installation

Multi-Trait GWAS using Summary Association Statistics

Example Summary Statistics

Loading multiple GWAS summary statistics

A Simple Pleiotropic Meta-Analysis

Pleiotropic & Conditional Meta-Analysis in HWE Outbred Population