MRAgent
MRAgent is an LLM-based tool that automates Mendelian randomization analysis, reducing manual effort by autonomously scanning literature, selecting GWAS data, and generating comprehensive reports, aiding researchers in exploring disease causality efficiently.
Install / Use
/learn @xuwei1997/MRAgentREADME
MRAgent: An LLM-based Automated Agent for Causal Knowledge Discovery in Disease via Mendelian Randomization
Paper | GitHub | PyPI | Zenodo | Web Demo | 中文文档
Overview
Understanding causality in medical research is essential for developing effective interventions and diagnostic tools. Mendelian Randomization (MR) is a pivotal method for inferring causality through genetic data. However, MR analysis often requires pre-identification of exposure-outcome pairs from clinical experience or literature, which can be challenging to obtain. This poses difficulties for clinicians investigating causal factors of specific diseases. To address this, we introduce MRAgent, an innovative automated agent leveraging Large Language Models (LLMs) to enhance causal knowledge discovery in disease research. MRAgent autonomously scans scientific literature, discovers potential exposure-outcome pairs, and performs MR causal inference using extensive Genome-Wide Association Study (GWAS) data.
MRAgent Architecture

MRAgent Workflow
<div align="center"> <img src="./images/f2.png" width="60%"> </div>Installation
- To install this package, please ensure you have
Pythonversion >3.9, which can be downloaded from the standard distribution or the Anaconda distribution. - This package requires an R environment (version >4.3.4), which can be downloaded
from
R. You will also need to install the R packages:TwoSampleMR,ieugwasr,vcfR,MRlap, andjsonlite. While these packages can be automatically installed by the program, it is recommended to install them manually in advance, as the automatic installation may not always succeed. - Pip install this packages:
pip install mragent
Preparation
LLMs API Key
Before using MRAgent, you need to obtain the API Key for the LLMs, or run the LLM locally. The following LLM APIs are currently supported:
MRAgent is also compatible with all models available on any platform that supports the OpenAI API, simply ensure that the base_url is correctly specified for use.
You can also run LLM locally, and we currently support all open source models running on ollama:
You need to follow the steps to install ollama and follow the ollama python support package:
curl -fsSL https://ollama.com/install.sh | sh
pip install ollama
GWAS token
You need to get the GWAS token for the OpenGWAS data.
Usage
"Knowledge Discovery" mode
In the Knowledge Discovery mode, upon inputting a specific disease, the MRAgent autonomously scans and analyzes relevant literature from PubMed to identify potential exposures or outcomes associated with the disease. It then performs Mendelian randomization analysis to ascertain causal relationships between the disease and the identified exposures or outcomes, ultimately generating a comprehensive analysis report.
Run agent_workflow.py to start the MRAgent in the "Knowledge Discovery" mode:
Class
class MRAgent(self, mode='O', exposure=None, outcome=None, AI_key=None, model='MR', num=100, bidirectional=False,
synonyms=True, introduction=True, LLM_model='gpt-4o', gwas_token=None, opengwas_mode='csv',
mr_quality_evaluation=False, mr_quality_evaluation_key_item=None, mrlap=False)
Parameters:
- mode: str, 'O' or 'E' optional (default='O')
- The mode of the MRAgent. 'O' for "Knowledge Discovery" mode, the given disease is the outcome. 'E' for "Knowledge Discovery" mode, the given disease is the exposure.
- exposure: str, optional (default=None)
- The exposure of the MRAgent. If mode is 'E', the exposure is the given disease.
- outcome: str, optional (default=None)
- The outcome of the MRAgent. If mode is 'O', the outcome is the given disease.
- AI_key: str, optional (default=None)
- The API key for the LLMs. Optional if running local LLM.
- model: str, 'MR' or 'MR_MOE', optional (default='MR')
- MR methods in TwoSampleMR tool . 'MR' for the classical Mendelian randomization model. 'MR_MOE' for the Mendelian randomization model integrating a mixture-of-experts machine learning framework. If set to 'MR_MOE', you need to download the trained random forest model from this link to your working directory.
- num: int, optional (default=100)
- The number of articles to be retrieved from PubMed.
- bidirectional: bool, optional (default=False)
- Whether to perform bidirectional MR analysis.
- synonyms: bool, optional (default=True)
- Whether to obtain synonyms for exposure and outcome.
- introduction: bool, optional (default=True)
- Whether to print the introduction of the disease befor the MR.
- LLM_model: str, optional (default='gpt-4o')
- The LLM model used in the MRAgent.
- model_type: str, 'openai' or 'ollama', optional (default='openai')
- The model type of the LLM. 'openai' for using the openai api, 'ollama' for using a local model
- base_url: str, optional (default=None)
- The base url for the OpenAI GPT model. Used for compatibility with third-party API platforms.
- gwas_token: str, optional (default=None)
- The GWAS token for the OpenGWAS data.
- opengwas_model: str, 'csv' or 'online' optional (default='online')
- The model of the OpenGWAS data. 'csv' for the local OpenGWAS data. 'online' for the online OpenGWAS data.
- mr_quality_evaluation: bool, optional (default=False)
- Whether to evaluate the quality of the MR analysis using the STROBE-MR guidelines. For more information on STROBE-MR, please refer to the following link strobe-mr.org.
- mr_quality_evaluation_key_item: list, optional (default=None)
- The key items used to evaluate the quality of the MR analysis. If mr_quality_evaluation is True, you need to provide a list of key items for the STROBE-MR checklist. If any key item in the completed MR's STROBE-MR checklist is marked as No, the MR study is considered to be of low quality.
- mrlap: bool, optional (default=False)
- Whether to use the MRlap R package to detect and correct for sample overlap. For more information, please refer
to MRlap GitHub repository. If set to
TRUE, you must download the supporting filesldandhm3from this link to your working directory.
- Whether to use the MRlap R package to detect and correct for sample overlap. For more information, please refer
to MRlap GitHub repository. If set to
Methods:
run(self, step=None)
- step: list, optional (default=None)
- The step of the MRAgent. If step is None, the MRAgent will run all steps. If step is a list, the MRAgent will run the steps in the list.
Example:
Mendelian randomisation analysis using MRAgent to investigate exposures and outcomes associated with back pain.
from mragent import MRAgent
agent = MRAgent(outcome='back pain', model='MR', LLM_model='gpt-4o',
AI_key='xxxx', gwas_token='xxxx', bidirectional=True,
introduction=True, num=300)
agent.run(step=[1, 2, 3, 4, 5, 6, 7, 8, 9])
Output:
You can find the output in the output folder. Every time it runs, a folder named Disease_Model is generated under the output folder.
Inside this folder, you will find three data tables: exposure_and_outcome, which records paired exposure and outcome information, including study titles, previous MR analyses, and available GWAS data; outcome, which captures information on individual outcomes or exposures, including a complete list of GWAS IDs; and run, which records the final selected exposure and outcome pairs used for MR analysis. MRAgent can be run step-by-step, and you can manually adjust these three tables at any point to control the workflow.
In the subfolders of Disease_Model, you can view various MR results along with their corresponding reports. These subfolders contain detailed outputs for each analysis, including statistical results, plots, and other relevant documentation for interpretation.
"Causal Validation" mode
In the Causal Validation mode, users can directly input a pair of exposure and outcome, and the MRAgent independently carries out all steps of the Mendelian randomization study, providing a convenient and efficient report.
Run agent_workflow_OE.py to start the MRAgent in the "Causal Validation" mode:
Class
class MRAgentOE(self, mode='OE', exposure=None, outcome=None, AI_key=None, model='MR', bidirectional=False,
synonyms=True, introduction=True, LLM_model='gpt-4o', gwas_token=None, opengwas_mode='csv',
mr_quality_evaluation=False, mr_quality_evaluation_key_item=None, mrlap=False)
Parameters:
- mode: str, 'OE', optional (default='OE')
- The mode of the MRAgent. 'OE' for "Causal Validation" mode.
- exposure: str, optional (default=None)
- The exposure of the MRAgent.
- outcome: str, optional (default=None)
- The outcome of the MRAgent.
- AI_key: str, optional (default=None)
- The API key for the LLMs. Optional if running local LLM.
- model: str, 'MR' or 'MR_MOE', optional (default='MR')
- MR methods in TwoSampleMR tool . 'MR' fo
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
