OPERA
This software tool implements the OPERA (omics pleiotropic association) method to test for combinatorial pleiotropic associations of molecular phenotypes (e.g., expression level of a gene and DNA methylation level at CpG sites) with a complex trait of interest using summary-level data from GWAS and molecular QTL studies. OPERA is a Bayesian generalization of the SMR & HEIDI approach8 to a multi-omics model, where the molecular phenotypes are considered as exposures and only the complex trait is considered as the outcome. This tool can therefore be used to prioritize molecular phenotypes that mediate the genetic effects for complex trait and provide mechanistic interpretation of the GWAS signal.
Install / Use
/learn @wuyangf7/OPERAREADME
OPERA This software tool implements the OPERA (omics pleiotropic association) method, which allows for testing the combinatorial pleiotropic associations between multiple molecular phenotypes (e.g., expression level of a gene and DNA methylation level at CpG sites) with a complex trait of interest using summary-level data from GWAS and molecular QTL studies. OPERA is a Bayesian extension of the SMR & HEIDI approach to a multi-omics model, where the molecular phenotypes are considered as exposures and the complex trait is considered as the outcome. This tool can therefore be used to prioritize molecular phenotypes that mediate the genetic effects for complex trait and further provide mechanistic interpretation of the GWAS signal.
Credits
- Yang Wu developed the software tool with the support from Ting Qi, Jian Zeng, and Jian Yang.
- Yang Wu, Jian Zeng and Jian Yang developed the OPERA method.
- The software is programmed and maintained by Yang Wu.
Questions and Help Requests
If you have any bug reports or questions, please send an email to Yang Wu (yang.wu@scu.edu.cn) at West China Hospital, Sichuan University and Jian Zeng (j.zeng@uq.edu.au) at Institute for Molecular Bioscience, The University of Queensland, and Jian Yang (jian.yang@westlake.edu.cn) at School of Life Sciences, Westlake University.
Citations
Wu Y., Qi T., Wray N.R., Visscher P.M., Zeng J. & Yang J. (2023) Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes. Cell Genomics.
Installation
To install OPERA, you can download the opera_Linux.zip package, which contains a standalone (i.e., statically linked) 64-bit Linux executable file opera_Linux. We strongly recommend using this static executable because it is well-optimized and no further installation is required.
For compiling yourself, you should clone this repository and use the MAKEFILE,
git clone https://github.com/wuyangf7/OPERA
cd OPERA
make
There are dependencies on your local MKL, BOOST and EIGEN Libraries.
Tutorial
The OPERA analysis consists of three steps. OPERA first estimates the global frequencies of each possible association pattern. It then caculates the posterior probability for each configuration by weighting the data likelihood with the estimated global frequencies. The posterior probability of associations (PPA) for any combinatorial sites is computed by summing up the posterior probability of configurations where the given site combination is present. For associations that passed a PPA signficance threshold (e.g., 0.9 by default), OPERA performs the heterogeneity test to reject the associations that are due to linkage.
We have curated and prepared a variety of publicly available molecular QTL data (download here), which users can use to perform the OPERA analysis with their specific complex trait of interest. For illustration purpose, we provide the demonstration data that can be used to run opera analysis with command line below.
To visualize the pleiotropic associations between multiple molecular phenotypes and a complex trait of interest at a GWAS locus of interest, we have provided command line in OPERA and R scripts below.
Run OPERA for stage 1 analysis
opera --besd-flist mylist --gwas-summary mygwas.ma --mbfile mybdatalist --estimate-pi --out myopera
- --besd-flist reads a file to get the full paths of the multiple xQTL BESD files. The input format follows that for the SMR analysis (https://cnsgenomics.com/software/smr/#DataManagement).
- --gwas-summary reads summary-level data from GWAS. The input format follows that for GCTA-COJO analysis (http://cnsgenomics.com/software/gcta/#COJO).
- --mbfile reads a text file with each row representing a PLINK binary file (e.g., one for each chromosome) from a reference sample for LD estimation, i.e. .bed, .bim, and .fam files. The chromosome-wide .fam files are required to contain the same individuals. The input file format is blow. Note that stage 1 analysis requires the genome-wide LD reference to estimate the global parameters. If the genome-wide genotype data in PLINK format (containing the genotype data for each chromosome) is ready, please switch the flag --bfile to read the standard PLINK binary file as LD reference.
mydata
mydata2
- --out saves the estimation of prior probabilities from the OPERA stage 1 analysis in .pi file and the estimation of prior variances in .var file (text format, see example from demo data below).
Posteriors Pi1(0:0) Pi2(0:1) Pi3(1:0) Pi4(1:1)
Mean 0.605441 0.168855 0.0743364 0.151368
SD 0.20765 0.186273 0.134037 0.16549
Variance 2.000000e-02 2.000000e-02
The .pi output includes the posterior mean and SD from MCMC for each possible configuration. The .var output includes the prior variance for each xQTL data. The posterior samples from MCMC are also printed in the log file, for example,
Iteration Pi1(0:0) Pi2(0:1) Pi3(1:0) Pi4(1:1)
0 0.0557406 0.319719 0.0834392 0.541101
100 0.163492 0.422161 0.413673 0.000673509
200 0.661116 0.338883 2.4066e-07 6.7008e-15
300 0.587679 0.412321 5.49276e-11 3.1432e-09
400 0.814359 0.101207 0.0586814 0.0257529
500 0.430485 8.37885e-05 0.0929275 0.476504
600 0.282069 0.657061 0.017055 0.0438149
...
Columns are iteration numbers and posterior samples for each configuration from the MCMC.
Other parameters for stage 1 analysis
opera --besd-flist mylist --gwas-summary mygwas.ma --bfile mydata --sample-overlap --estimate-pi --extract-snp mysnplist --prior-var 0.02,0.02 --pi-wind 100 --out myopera --thread-num 3
- --bfile reads individual-level SNP genotype data (in PLINK binary format) from a reference sample for LD estimation, i.e. .bed, .bim, and .fam files.
- --extract-snp specifies a snplist (e.g., Hapmap3 SNP list) to be extracted across LD reference, xQTL data and GWAS summary data, and used for stage 1 analysis.
- –-prior-var specifies the prior variance of the non-zero mediated effects for each molecular trait on the complex trait. It can be computed by the variance of the estimated SMR effects of each molecular trait on complex trait at a significance level (e.g., FDR < 0.05) adjusting for the estimation errors.
- --sample-overlap specifies the flag to let OPERA consider the between-study correlations due to overlapping samples. OPERA will automatically output the estimated correlations in .rho file (text format, see below example).
1.000000e+00 0.200000e+00
0.200000e+00 1.000000e+00
- --opera-smr turns on the flag of using the estimated smr effect rather than the estimated joint smr effect to run the stage 1 analysis.
- --pi-wind defines a window centered on the molecular phenotype with smallest number of sites to select no overlap independent loci, e.g., 200 Kb (default).
- --thread-num specifies the number of OpenMP threads for parallel computing.
Run OPERA for stage 2 analysis and heterogeneity analysis
opera --besd-flist mylist --extract-gwas-loci myloci --chr 7 --gwas-summary mygwas.ma --bfile mydata --prior-pi-file myopera.pi --prior-var-file myopera.var --out myopera_chr7
Note: Only the cis-SNPs of each exposure site are used, so the stage 2 analysis can be performed for each chromosome seperately. The genome-wide analysis results can be combined through shell command below, e.g., for association between single exposure and outcome,
awk 'NR==1 || FNR>1' myopera_chr*_1_expos_assoc.ppa > myopera_1_expos_assoc.ppa
- --extract-gwas-loci extracts a subset of genomic region with GWAS COJO or LD clumpped loci (i.e., 2Mb region by default) to perform opera analysis. The input file format is
Chr SNP bp
7 rs1859788 99971834
7 rs7810606 143108158
- --chr specifies the target chromosome for SNPs and exposure sites to run chromosome-wide opera stage 2 analysis.
- --bfile reads individual-level SNP genotype data (in PLINK binary format) from a reference sample for LD estimation.
- --prior-pi-file reads the prior probabilities estimated from the stage 1 analysis (i.e., the posterior Mean from stage 1 analysis).
- --prior-var-file reads the prior variances estimated from the stage 1 analysis.
- --out saves the marginal or joint PPA and multi-exposure HEIDI test P-values for each association hypothesis passed significance threshold (e.g., PPA > 0.9 and P-HEIDI > 0.01 default) in .ppa file (text format, see below example). Results are seperated as different number of exposure combinations associated with the outcome, e.g., single exposure association,
Chr GWAS_SNP GWAS_bp Expo1_ID Expo1_bp PPA(1) p_HEIDI(1)
7 rs1859788 99971834 cg08582801 99588335 0.929121 8.152293e-02
7 rs1859788 99971834 cg13210467 99775443 0.999054 1.625980e-01
7 rs1859788 99971834 cg19116668 99932089 1 1.125248e-01
7 rs1859788 99971834 cg26429636 99573747 0.915179 5.926392e-02
...
and associations between 2 exposure(s) and 1 outcome.
Chr GWAS_SNP GWAS_bp Expo1_ID Expo1_bp Expo2_ID Expo2_bp PPA(1,2) p_HEIDI(1,2)
7 rs1859788 99971834 ENSG00000085514 99981436 cg13210467 99775443 0.998746 1.787164e-02
7 rs1859788 99971834 ENSG00000146828 100444536 cg01869186 100423987 0.906761 1.641383e-01
7 rs1859788 99971834 ENSG00000146828 100444536 cg12616177 100434510 0.902426 1.044594e-01
...
Columns are chromosome, rs ID for GWAS SNP, physical position for GWAS SNP, probe ID for the 1st exposure, probe position for the 1st exposure, probe ID for the 2nd expo
