MRlap
R package to perform two-sample Mendelian Randomisation (MR) analyses using (potentially) overlapping samples
Install / Use
/learn @n-mounier/MRlapREADME
MRlap <img src="inst/Figures/logo.png" align="right" height=180/>
<!--- # https://github.com/GuangchuangYu/hexSticker library(hexSticker) imgurl <- "inst/Figures/MRlap2.png" sticker(imgurl, package="", p_size=8, p_color="black", h_fill="black", h_color="black", s_x=0.95, s_y=0.90, s_width=0.95, # MRlap2 # s_x=0.95, s_y=1.05, s_width=0.95, # MRlap filename="inst/Figures/logo.png", dpi=2000) ---> <!--- :arrow_right: ESHG/EMGM?? poster is available [here](). --->:information_source: MRlap is still under active development.
:information_source: MRlap has been updated to version 0.0.3.
Note that in version 0.0.1 the exclusion of IVs more strong associated
with the outcome than with the exposure was not correctly performed.
Check the NEWS to learn more about what has been modified!
Overview
MRlap is an R-package to perform two-sample Mendelian Randomisation
(MR) analyses using (potentially) overlapping samples, relying only on
GWAS summary statistics. MR estimates can be subject to different types
of biases due to the overlap between the exposure and outcome samples,
the use of weak instruments and winner’s curse. Our approach
simultaneously accounts and corrects for all these biases, using
cross-trait LD-score regression (LDSC) to approximate the overlap.
Estimating the corrected effect using our approach can be performed as a
sensitivity analysis: if the corrected effect do not significantly
differ from the observed effect, then IVW-MR estimate can be safely
used. However, when there is a significant difference, corrected effects
should be preferred as they should be less biased, independently of the
sample overlap.
Note that we are working with standardised effects. This means that the
causal effect estimates are in units of Standard Deviation (SD). The
causal effect estimates correspond to the SD change in the outcome for
one SD increase in the exposure.
It is possible to use case-control GWASs, but it is important to note
that our method assumes that sample overlap should be independend of
case-control status. Morever, the analysis needs to be performed on the
observed scale. This means that GWAS results (from a linear model using
case-control status or from a logisitic regression) should be provided
alongside the total (number of cases + number of controls) sample size.
In such cases, the heritability estimates reported in the results will
be different to the ones that are usually estimated (on the liability
scale), this is normal. The causal effect estimates correspond to the SD
changes / increases on the observed scale.
This package builds up on the
GenomicSEM R-package to
perform cross-trait LDSC and the
TwoSampleMR R-package for
inverse-variance weighted (IVW-)MR analysis, and on the
ieugwasr for instruments
pruning.
There is only one function available:
MRlap()
main function that performs LDSC, IVW-MR analysis and provides a corrected causal effect estimate.
More details about its usage can be found in the manual.
Installation
You can install the current version of MRlap with:
# Directly install the package from github
# install.packages("remotes")
remotes::install_github("n-mounier/MRlap")
library(MRlap)
<!--- Note: using remotes instead of devtools leads to re-build the package
and apparently, it may be a problem with R 3.4 and macOS,
see https://stackoverflow.com/questions/43595457/alternate-compiler-for-installing-r-packages-clang-error-unsupported-option/43943631#43943631 --->
Usage
To run the analysis with MRlap different inputs are needed:
1. The exposure and outcome GWAS summary statistics (exposure & outcome):
Can be a regular (space/tab/comma-separated) file or a gzipped file
(.gz) or a data.frame. Must contain the following columns, which can
have alternative names (not case sensitive):
SNP-identifier: rs or rsid, snp, snpid, rnpid
Alternate (effect) allele: a1 or alt, alts
Reference allele: a2, a0, ref
Z-statistics: Z, zscore
Sample size: N, Neff
If the Z-statistics is not present, it will be automatically calculated
from effect size and standard error, in which case the following columns
should be provided:
Effect-size: b, beta, beta1 , or
Standard error: se, std
To perform distance-pruning of instruments, the following columns must
also be present:
chromosome: chr
position: pos
If (at least) one of the datasets is coming from a case-control
GWAS:
… the Sample size column should correspond to the total sample size (not
the effective sample size!!).
2. The input files for LDSC (ld & hm3):
These are needed by the
GenomicSEM R-package.
- ld:
Expects LD scores formated as required by the original LD score regression software. Weights for the european population can be obtained by downloading the eur_w_ld_chr folder in the link below (Note that these are the same weights provided by the original developers of LDSC): https://utexas.box.com/s/vkd36n197m8klbaio3yzoxsee6sxo11v
- hm3:
This file can be obtained from https://utexas.box.com/s/vkd36n197m8klbaio3yzoxsee6sxo11v.
Analysis
Before running the examples, please make sure to have downloaded the
input files for LDSC. You may also need to modify the ld & hm3
parameters to indicate the correct paths.
- Example A
# Using ~100K samples for BMI/SBP, with 0% of sample overlap
# (only weak instrument bias and winner's curse)
# Note that here the overlap is known (since we generated the data) but the MRlap
# function works even the overlap is unkown (overlap is *not* a parameter of the function)
# as it uses cross-trait LDSC to approximate it
# (1,150,000 SNPs - stored in gzipped files)
BMI <- system.file("data/", "BMI_Data.tsv.gz", package="MRlap")
SBP <- system.file("data/", "SBP_Data.tsv.gz", package="MRlap")
# MR instruments will be selected using default parameter (5e-8) and distance-pruned (500Kb),
# No file will be saved.
A = MRlap(exposure = BMI,
exposure_name = "BMI_100Ksample",
outcome = SBP,
outcome_name = "SBP_100Ksample",
ld = "~/eur_w_ld_chr",
hm3 = "~/w_hm3.noMHC.snplist")
<details>
<summary>
Show log
</summary>
```
## <<< Preparation of analysis >>>
## > Checking parameters
## The p-value threshold used for selecting MR instruments is: 5e-08
## The distance used for pruning MR instruments is: 500 Kb
## > Processing exposure (BMI_100Ksample) summary statistics...
## # Preparation of the data...
## The data.frame used as input is: "BMI_Data.tsv.gz".
## SNPID column, ok - CHR column, ok - POS column, ok - ALT column, ok - REF column, ok - Z column, ok - N column, ok
## > Processing outcome (SBP_100Ksample) summary statistics...
## # Preparation of the data...
## The data.frame used as input is: "SBP_Data.tsv.gz".
## SNPID column, ok - CHR column, ok - POS column, ok - ALT column, ok - REF column, ok - Z column, ok - N column, ok
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
## <<< Performing cross-trait LDSC >>>
## > Munging exposure data...
## > Munging outcome data...
## > Running cross-trait LDSC...
## Please consider saving the log files and checking them to ensure that all columns were interpreted correctly and no warnings were issued for any of the summary statistics files
## > Cleaning temporary files...
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
## <<< Running IVW-MR >>>
## > Identifying IVs...
## 668 IVs with p < 5e-08
## 0 IVs excluded - more strongly associated with the outcome than with the exposure, p < 1e-03
## Pruning : distance : 500 Kb
## 39 IVs left after pruning
## > Performing MR
## IVW-MR observed effect: 0.0856 ( 0.0398 )
## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
## <<< Estimating corrected effect >>>
## > Estimating genetic architecture parameters...
## > Estimating corrected effect...
## corrected effect: 0.115 ( 0.0535 )
## covariance between observed and corrected effect: 0.00209
## 7000 simulations were used to estimate the variance and the covariance.
## > Testing difference between observed and corrected effect...
## Runtime of the analysis: 2 minute(s) and 19 second(s).
```
</details>
- Example B
# Using simulated exposure/outcome data
# standard settings scenario, with 100% of sample overlap
# Note that here the overlap is known (since we generated the data) but the MRlap
# function works even the overlap is unkown (overlap is *not* a parameter of the function)
# as it uses cross-trait LDSC to approximate it
# (~750,000 SNPs - stored as data.frames)
# MR instruments will be selected using a more stringent threshold (5e-10) and LD-pruned (500Kb - r2=0.05),
# No file will be saved.
B = MRlap(exposure = SmallExposure_Data,
exposure_name = "simulated_exposure",
outcome = SmallOutcome_Data,
outcome_name = "simulated_outcome",
ld = "~/eur_w_ld_chr",
hm3 = "~/w_hm3.noMHC.snplist",
MR_threshold = 5e-10,
MR_pruning_LD = 0.05)
<details>
<summary>
Show log
</summary>
```
## <<< Preparation of analysis >>>
## > Checking parameters
## The p-value threshold used
