OPERA This software tool implements the OPERA (omics pleiotropic association) method, which allows for testing the combinatorial pleiotropic associations between multiple molecular phenotypes (e.g., expression level of a gene and DNA methylation level at CpG sites) with a complex trait of interest using summary-level data from GWAS and molecular QTL studies. OPERA is a Bayesian extension of the SMR & HEIDI approach to a multi-omics model, where the molecular phenotypes are considered as exposures and the complex trait is considered as the outcome. This tool can therefore be used to prioritize molecular phenotypes that mediate the genetic effects for complex trait and further provide mechanistic interpretation of the GWAS signal.

Credits

Yang Wu developed the software tool with the support from Ting Qi, Jian Zeng, and Jian Yang.
Yang Wu, Jian Zeng and Jian Yang developed the OPERA method.
The software is programmed and maintained by Yang Wu.

Questions and Help Requests

If you have any bug reports or questions, please send an email to Yang Wu (yang.wu@scu.edu.cn) at West China Hospital, Sichuan University and Jian Zeng (j.zeng@uq.edu.au) at Institute for Molecular Bioscience, The University of Queensland, and Jian Yang (jian.yang@westlake.edu.cn) at School of Life Sciences, Westlake University.

Citations

Wu Y., Qi T., Wray N.R., Visscher P.M., Zeng J. & Yang J. (2023) Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes. Cell Genomics.

Installation

To install OPERA, you can download the opera_Linux.zip package, which contains a standalone (i.e., statically linked) 64-bit Linux executable file opera_Linux. We strongly recommend using this static executable because it is well-optimized and no further installation is required.

For compiling yourself, you should clone this repository and use the MAKEFILE,

git clone https://github.com/wuyangf7/OPERA
cd OPERA
make

There are dependencies on your local MKL, BOOST and EIGEN Libraries.

Tutorial

The OPERA analysis consists of three steps. OPERA first estimates the global frequencies of each possible association pattern. It then caculates the posterior probability for each configuration by weighting the data likelihood with the estimated global frequencies. The posterior probability of associations (PPA) for any combinatorial sites is computed by summing up the posterior probability of configurations where the given site combination is present. For associations that passed a PPA signficance threshold (e.g., 0.9 by default), OPERA performs the heterogeneity test to reject the associations that are due to linkage.

We have curated and prepared a variety of publicly available molecular QTL data (download here), which users can use to perform the OPERA analysis with their specific complex trait of interest. For illustration purpose, we provide the demonstration data that can be used to run opera analysis with command line below.

To visualize the pleiotropic associations between multiple molecular phenotypes and a complex trait of interest at a GWAS locus of interest, we have provided command line in OPERA and R scripts below.

Run OPERA for stage 1 analysis

opera --besd-flist mylist --gwas-summary mygwas.ma --mbfile mybdatalist --estimate-pi --out myopera

--besd-flist reads a file to get the full paths of the multiple xQTL BESD files. The input format follows that for the SMR analysis (https://cnsgenomics.com/software/smr/#DataManagement).
--gwas-summary reads summary-level data from GWAS. The input format follows that for GCTA-COJO analysis (http://cnsgenomics.com/software/gcta/#COJO).
--mbfile reads a text file with each row representing a PLINK binary file (e.g., one for each chromosome) from a reference sample for LD estimation, i.e. .bed, .bim, and .fam files. The chromosome-wide .fam files are required to contain the same individuals. The input file format is blow. Note that stage 1 analysis requires the genome-wide LD reference to estimate the global parameters. If the genome-wide genotype data in PLINK format (containing the genotype data for each chromosome) is ready, please switch the flag --bfile to read the standard PLINK binary file as LD reference.

mydata
mydata2

--out saves the estimation of prior probabilities from the OPERA stage 1 analysis in .pi file and the estimation of prior variances in .var file (text format, see example from demo data below).

Posteriors      Pi1(0:0)        Pi2(0:1)        Pi3(1:0)        Pi4(1:1)
Mean    0.605441        0.168855        0.0743364       0.151368
SD      0.20765 0.186273        0.134037        0.16549

Variance        2.000000e-02    2.000000e-02

The .pi output includes the posterior mean and SD from MCMC for each possible configuration. The .var output includes the prior variance for each xQTL data. The posterior samples from MCMC are also printed in the log file, for example,

Iteration       Pi1(0:0)        Pi2(0:1)        Pi3(1:0)        Pi4(1:1)
0       0.0557406       0.319719        0.0834392       0.541101
100     0.163492        0.422161        0.413673        0.000673509
200     0.661116        0.338883        2.4066e-07      6.7008e-15
300     0.587679        0.412321        5.49276e-11     3.1432e-09
400     0.814359        0.101207        0.0586814       0.0257529
500     0.430485        8.37885e-05     0.0929275       0.476504
600     0.282069        0.657061        0.017055        0.0438149
...

Columns are iteration numbers and posterior samples for each configuration from the MCMC.

Other parameters for stage 1 analysis

opera --besd-flist mylist --gwas-summary mygwas.ma --bfile mydata --sample-overlap --estimate-pi --extract-snp mysnplist --prior-var 0.02,0.02 --pi-wind 100 --out myopera --thread-num 3

--bfile reads individual-level SNP genotype data (in PLINK binary format) from a reference sample for LD estimation, i.e. .bed, .bim, and .fam files.
--extract-snp specifies a snplist (e.g., Hapmap3 SNP list) to be extracted across LD reference, xQTL data and GWAS summary data, and used for stage 1 analysis.
–-prior-var specifies the prior variance of the non-zero mediated effects for each molecular trait on the complex trait. It can be computed by the variance of the estimated SMR effects of each molecular trait on complex trait at a significance level (e.g., FDR < 0.05) adjusting for the estimation errors.
--sample-overlap specifies the flag to let OPERA consider the between-study correlations due to overlapping samples. OPERA will automatically output the estimated correlations in .rho file (text format, see below example).

1.000000e+00	0.200000e+00
0.200000e+00	1.000000e+00

--opera-smr turns on the flag of using the estimated smr effect rather than the estimated joint smr effect to run the stage 1 analysis.
--pi-wind defines a window centered on the molecular phenotype with smallest number of sites to select no overlap independent loci, e.g., 200 Kb (default).
--thread-num specifies the number of OpenMP threads for parallel computing.

Run OPERA for stage 2 analysis and heterogeneity analysis

opera --besd-flist mylist --extract-gwas-loci myloci --chr 7 --gwas-summary mygwas.ma --bfile mydata --prior-pi-file myopera.pi --prior-var-file myopera.var --out myopera_chr7

Note: Only the cis-SNPs of each exposure site are used, so the stage 2 analysis can be performed for each chromosome seperately. The genome-wide analysis results can be combined through shell command below, e.g., for association between single exposure and outcome,

awk 'NR==1 || FNR>1' myopera_chr*_1_expos_assoc.ppa > myopera_1_expos_assoc.ppa

--extract-gwas-loci extracts a subset of genomic region with GWAS COJO or LD clumpped loci (i.e., 2Mb region by default) to perform opera analysis. The input file format is

Chr     SNP     bp
7       rs1859788       99971834
7       rs7810606       143108158

--chr specifies the target chromosome for SNPs and exposure sites to run chromosome-wide opera stage 2 analysis.
--bfile reads individual-level SNP genotype data (in PLINK binary format) from a reference sample for LD estimation.
--prior-pi-file reads the prior probabilities estimated from the stage 1 analysis (i.e., the posterior Mean from stage 1 analysis).
--prior-var-file reads the prior variances estimated from the stage 1 analysis.
--out saves the marginal or joint PPA and multi-exposure HEIDI test P-values for each association hypothesis passed significance threshold (e.g., PPA > 0.9 and P-HEIDI > 0.01 default) in .ppa file (text format, see below example). Results are seperated as different number of exposure combinations associated with the outcome, e.g., single exposure association,

Chr	GWAS_SNP	GWAS_bp	Expo1_ID	Expo1_bp	PPA(1)	p_HEIDI(1)
7	rs1859788	99971834	cg08582801	99588335	0.929121	8.152293e-02
7	rs1859788	99971834	cg13210467	99775443	0.999054	1.625980e-01
7	rs1859788	99971834	cg19116668	99932089	1	1.125248e-01
7	rs1859788	99971834	cg26429636	99573747	0.915179	5.926392e-02
...

and associations between 2 exposure(s) and 1 outcome.

Chr	GWAS_SNP	GWAS_bp	Expo1_ID	Expo1_bp	Expo2_ID	Expo2_bp	PPA(1,2)	p_HEIDI(1,2)
7	rs1859788	99971834	ENSG00000085514	99981436	cg13210467	99775443	0.998746	1.787164e-02
7	rs1859788	99971834	ENSG00000146828	100444536	cg01869186	100423987	0.906761	1.641383e-01
7	rs1859788	99971834	ENSG00000146828	100444536	cg12616177	100434510	0.902426	1.044594e-01
...

Columns are chromosome, rs ID for GWAS SNP, physical position for GWAS SNP, probe ID for the 1st exposure, probe position for the 1st exposure, probe ID for the 2nd expo

OPERA

Install / Use

README