SkillAgentSearch skills...

Mrlink2

MR-link-2: pleiotropy robust cis MR

Install / Use

/learn @adriaan-vd-graaf/Mrlink2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DOI

MR-link-2

MR-link-2 is a cis MR method that is pleiotropy robust. Inference with MR-link-2 requires pre-harmonized summary statistics of an exposure and an outcome, and a genotype reference file. We have validated this method in 3 different real-world datasets of causality.

Please find details of our validations and more information on the method in our paper.

If you have any questions or suggestions, feel free to open an issue. We appreciate everybody trying to use our software, so we try to come back to you as soon as possible!

If you use our paper, please cite us:

van der Graaf, A., Warmerdam, R., Auwerx, C. et al.
MR-link-2: pleiotropy robust cis Mendelian randomization validated in three independent reference datasets of causality.
Nat Commun 16,x 6112 (2025). https://doi.org/10.1038/s41467-025-60868-1

Requirements

MR-link-2 has been tested on MacOS X and Linux combined with Python 3.9, 3.10 and 3.11. Although not tested, every Python version from 3.6 onwards should work.
We require some (standard) python packages to be installed, these are: numpy, scipy, pandas, pyarrow, bitarray and duckdb. If you want to ensure all the tests run, pytest is also necessary. If they haven't been installed, please install these using pip. In the command line (shell, terminal), type:

pip3 install numpy scipy pandas bitarray pytest duckdb pyarrow

On top of this, we require plink1.9 to be present in your PATH variable. Check this by typing which plink in your shell. If this prints a path, you have plink installed in your path.

Testing if everything works as expected

This repository uses pytest to analyze results If you want to make sure that everything works as expected, please ensure you have pytest installed and run the following command.

pytest tests/*

For this you need to have everything installed from the requirements, including pytest If everything passes, you are ready to go! If not all the tests work, please open a github issue, and we'll get back to you ASAP.

Example

If you want to test MR-link-2 we have two examples:

This command tests for a causal effect in a region of synthetic data

python3 mr_link_2_standalone.py \
            --reference_bed example_files/reference_cohort \
            --sumstats_exposure example_files/yes_causal_exposure.txt \
            --sumstats_outcome example_files/yes_causal_outcome.txt \
            --out example_of_a_causal_effect.txt

This command tests for a non-causal effect in a region of synthetic data:

python3 mr_link_2_standalone.py \
            --reference_bed example_files/reference_cohort \
            --sumstats_exposure example_files/non_causal_exposure.txt \
            --sumstats_outcome example_files/non_causal_outcome.txt \
            --out example_of_a_non_causal_effect.txt

After running these two commands (takes about 2 seconds each), they will output two tab separated files with results: example_of_a_causal_effect.txt and example_of_a_non_causal_effect.txt.

# causal effect
region                  var_explained   m_snps_overlap   alpha                   se(alpha)               p(alpha)                sigma_y                 se(sigma_y)             p(sigma_y)              sigma_x                 function_time
2:101532661-103480976	0.99	        1131	         0.4193872544798938	     0.0565878953774011	     1.25110890260147e-13	 0.1549295595449322	     0.00624199019634178	 5.3812527742659674e-136 0.564994403645872	     0.09744000434875488

In the above line we see that the causal effect alpha is 0.42 , with a P value of 1.3x10^-13. The sigma_y estimate is large (0.155), and very significant (P: 5.3x10^-136). Indicating a causal effect, as well as a pleiotropic effect.

# non causal effect
region                  var_explained   m_snps_overlap   alpha                   se(alpha)               p(alpha)                sigma_y                 se(sigma_y)             p(sigma_y)              sigma_x                 function_time
2:101515908-103411057	0.99	        1131	         -0.03309665745471561	 0.052906181109434916	 0.5315953131580522	     0.16421519190135686	 0.006618939960069835	 7.011323342257353e-136	 0.5641943699065054	     0.0805368423461914

In the following example, line we see that the causal effect alpha is close to zero, with a P value of 0.52. The sigma_y estimate again is large (0.16) and very significant (P: 7.0x10^-136). This indicates that the locus is very pleiotropic.

Alternatively, you can use the test_commands.sh script to run a few tests, and have a look at their results.

Nb. results may be slightly different in your version, which may be due to the stochastic nature of the methods' inference, and or differences in software versions.

Usage

MR-link-2 accepts full summary statistics files from which it will do the following:

  1. Identify all the associated regions from the exposure files.
  2. For each associated region, generate an LD correlation matrix, and make an MR-link-2 estimate.

So it is not necessary to pre-select regions. MR-link-2 also performs some rudimentary allele harmonization, but please do your own checks beforehand as well.

The mr_link_2_standalone.py script uses plink like syntax to as a command. To see all the options, type python3 mr_link_2_standalone.py --help, which will output the following:

usage: mr_link_2_standalone.py [-h] --reference_bed REFERENCE_BED --sumstats_exposure SUMSTATS_EXPOSURE --sumstats_outcome SUMSTATS_OUTCOME --out OUT [--tmp TMP] [--p_threshold P_THRESHOLD] [--region_padding REGION_PADDING] [--maf_threshold MAF_THRESHOLD] [--max_correlation MAX_CORRELATION]
                               [--max_missingness MAX_MISSINGNESS] [--var_explained_grid VAR_EXPLAINED_GRID [VAR_EXPLAINED_GRID ...]] [--continue_analysis] [--no_normalize_sumstats] [--verbose VERBOSE]

MR-link-2: Pleiotropy robust cis Mendelian randomization

options:
  -h, --help            show this help message and exit
  --reference_bed REFERENCE_BED
                        The plink bed file prepend of the genotype file that can be used as an LD reference. Usage is the same as in the plink --bed command
  --sumstats_exposure SUMSTATS_EXPOSURE
                        The summary statistics file of the exposure file. Please see the README file or the example_files folder for examples on how to make these files.
  
  --sumstats_outcome SUMSTATS_OUTCOME [SUMSTATS_OUTCOME]
                        The summary statistics file of the outcome file. Please see the README file or the example_files folder for examples on how to make these files.
                        We allow multiple outcomes to be analyzed at the same time, include the files separated by spaces
  --out OUT             The path where to output results
  --tmp TMP             Not necessary anymore: a prepend on where to save temporary files DEPRECATED
  --p_threshold P_THRESHOLD
                        The P value threshold for which select regions. This is the same as the clumping P value threshold
  --region_padding REGION_PADDING
                        The base pair padding (on one side) on each clumped SNP which defines the region in which MR-link-2 will perform its inference
  --maf_threshold MAF_THRESHOLD
                        The minor allele frequency threshold used for clumping, and for calculating the LD matrix.This will not be applied to the summary statistics files
  --max_correlation MAX_CORRELATION
                        The maximum correlation allowed in the LD matrix, if the correlation is higher than this value between a pair of SNPs, will only keep one of them. This value is used to reduce eigenvalue decomposition failures
  --max_missingness MAX_MISSINGNESS
                        This is the maximum amount of individual missingness that is allowed in a summary statistic MR-link-2 can be sensitive to large differences in summary statistic missingness, so by default each SNP association should have at least 0.95 of observations available.
  --var_explained_grid VAR_EXPLAINED_GRID [VAR_EXPLAINED_GRID ...]
                        This field specifies the amount of variance explained of the LD matrix that is used by MR-link-2. You can add onto this field, and all variances explained will be added: --var_explained_grid 0.99 0.999 0.2 0.96 0.1 will perform an MR-link-2 estimate for all these values.
  --continue_analysis   Flag to continue an already started analysis, if specified this will look for a temporary file, and if it present, reuse its results. This can be handy if you have hundreds of associated regions, which can sometimes take a long time to run.
  --regions_to_read_at_the_same_time NUMBER OF REGIONS
                        Number of regions to read at the same time from a file. Will reduce I/O times, but can increase memory usage
                        Only change this when you run into memory errors, or if you want to make a lot of comparisons.
  --prespecified_regions PRESPECIFIED_REGIONS
                        Specify which regions to do. format is the following: 
                        `{chr_1}:{start_1}-{end_1},{chr_2}:{start_2}-{end_2}`. 
                        i.e. these are regions that are separated with the comma character.
                        This step will skip the clumping step to identify regions, and will blindly perform MR on 
                        regions that may or may not be associated to the exposure. 
                        This feature has been included to allow investigators to identify associated regions
                        in one discovery cohort, and assess the MR effect in a different cohort. 
                        This is a scenario for instance to remove winners curse.

  --max_snps_in_c
View on GitHub
GitHub Stars22
CategoryDevelopment
Updated10d ago
Forks3

Languages

Python

Security Score

90/100

Audited on Mar 19, 2026

No findings