MvGPS

Tools for estimating causal effects for multivariate continuous exposures

Generate Convert Improve

Install / Use

/learn @williazo/MvGPS

About this skill

Quality Score

0/100

README

Multivariate Generalized Propensity Score (mvGPS)

The goal of this package is to expand currently available software to estimate weights for multivariate continuous exposures. Weights are formed assuming a multivariate normal distribution for the simultaneous exposures.

Installation

You can install mvGPS from CRAN for the stable release or GitHub for the development version using the following code:

# stable release in CRAN
install.packages("mvGPS")

# development version on GitHub
install.packages("devtools")
devtools::install_github("williazo/mvGPS")

Example

Data Generating

To illustrate a simple setting where this multivariate generalized propensity score would be useful, we can construct a directed acyclic graph (DAG) with a bivariate exposure, D=(D1, D2), confounded by a set C=(C1, C2, C3). In this case we assume C1 and C2 are associated with D1, while C2 and C3 are associated with D2 as shown below.

To generate this data we first draw n=200 samples from C assuming a multivariate normal distribution with mean equal to zero, variance equal to 1, and constant covariance of 0.1.

Next we define our exposure as a linear function of our confounders. Explicitly these two equations are defined as

E[D1|C]=0.5C1+C2,

E[D2|C]=0.3C2+0.75C3.

With this construction, the exposures have one confounder in common, C2, and one independent confounder. The effect size of the confounders vary for each exposure. We assume that the conditional distribution of D given C is bivariate normal with conditional correlation equal to 0.2 and conditional variance equal to 2.

To generate the set of confounders and the corresponding bivariate exposure we can use the function gen_D() as shown below.

require(mvGPS)
sim_dt <- gen_D(method="u", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
                C_mu=rep(0, 3), C_cov=0.1, C_var=1,
                d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C

By construction our marginal correlation of D is a function of parameters from the distribution of C, coefficients of conditional mean equations, and conditional covariance parameter. For the above specification the true marginal correlation of exposure is equal to 0.24 and our observed marginal correlation is equal to 0.26.

Finally, we specify our outcome, Y, as a linear combination of the confounders and exposure. The mean of the dose-response equation is shown below,

E[Y|D, C]=0.75C1+1C2+0.6C3+D1+D2.

Both exposures have treatment effect sizes equal to one. The standard deviation of our outcome is set equal 2.

alpha <- c(0.75, 1, 0.6, 1, 1)
sd_Y <- 2
X <- cbind(C, D)
Y <- X%*%alpha + rnorm(200, sd=sd_Y)

Generating Weights

With the data generated, we can now use our primary function mvGPS() to estimate weights. These weights are constructed such that the numerator is equal to the marginal density, with the denominator corresponding to the conditional density, i.e., the multivariate generalized propensity score.

$weight equation generic$

In our case since the bivariate exposure is assumed to be bivariate normal, we can break both the numerator and denominator into full conditional densities knowing that each univariate conditional expression will remain normally distributed.

$weight equation bivariate normal$

Notice in the equation above, we are also able to specify the confounding set for each exposure separately.

require(mvGPS)
out_mvGPS <- mvGPS(D=D, C=list(C[, 1:2], C[, 2:3]))
w <- out_mvGPS$w

This vector w now can be used to test balance of confounders by comparing weighted vs. unweighted correlations and to estimate the treatment effects using weighted least squares regression.

Balance Assessment

For continuous exposure(s) we can asses balance using several metrics such as euclidean distance, maximum absolute correlation, and average absolute correlation where correlation refers to the Pearson correlation between exposure and covariate.

Below we use the function bal() to specify a set of potential models to use for comparison. Possible models that are available include: mvGPS, Entropy, CBPS, GBM, and PS. For methods other than mvGPS which can only estimate univariate continuous exposure, each exposure is fit separately so that weights are generated for both exposures.

require(knitr)
bal_results <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D, C=list(C[, 1:2], C[, 2:3]))
bal_summary <- bal_results$bal_metrics
#contains overall summary statistics with respect to balance
bal_summary <-data.frame(bal_summary, ESS=c(bal_results$ess, nrow(D)))
#adding in ESS with last value representing the unweighted case
bal_summary <- bal_summary[order(bal_summary$max_cor), ]

kable(bal_summary[, c("euc_dist", "max_cor", "avg_cor", "ESS", "method")],
      digits=4, row.names=FALSE,
      col.names=c("Euc. Distance", "Max. Abs. Corr.",
                  "Avg. Abs. Corr.", "ESS", "Method"))

Euc. Distance

</th> <th style="text-align:right;">

Max. Abs. Corr.

</th> <th style="text-align:right;">

Avg. Abs. Corr.

</th> <th style="text-align:right;">

ESS

</th> <th style="text-align:left;">

Method

</th> </tr> </thead> <tbody> <tr> <td style="text-align:right;">

0.0930