MvGPS
Tools for estimating causal effects for multivariate continuous exposures
Install / Use
/learn @williazo/MvGPSREADME
Multivariate Generalized Propensity Score (mvGPS)
<!-- begin badges --> <!-- end badges -->The goal of this package is to expand currently available software to estimate weights for multivariate continuous exposures. Weights are formed assuming a multivariate normal distribution for the simultaneous exposures.
Installation
You can install mvGPS from CRAN for the stable release or GitHub for
the development version using the following code:
# stable release in CRAN
install.packages("mvGPS")
# development version on GitHub
install.packages("devtools")
devtools::install_github("williazo/mvGPS")
Example
Data Generating
To illustrate a simple setting where this multivariate generalized propensity score would be useful, we can construct a directed acyclic graph (DAG) with a bivariate exposure, D=(D<sub>1</sub>, D<sub>2</sub>), confounded by a set C=(C<sub>1</sub>, C<sub>2</sub>, C<sub>3</sub>). In this case we assume C<sub>1</sub> and C<sub>2</sub> are associated with D<sub>1</sub>, while C<sub>2</sub> and C<sub>3</sub> are associated with D<sub>2</sub> as shown below.
<img src="README_files/figure-gfm/dag_draw-1.png" width="50%" style="display: block; margin: auto;" />To generate this data we first draw n=200 samples from C assuming a multivariate normal distribution with mean equal to zero, variance equal to 1, and constant covariance of 0.1.
Next we define our exposure as a linear function of our confounders. Explicitly these two equations are defined as
E[D<sub>1</sub>|C]=0.5C<sub>1</sub>+C<sub>2</sub>,
E[D<sub>2</sub>|C]=0.3C<sub>2</sub>+0.75C<sub>3</sub>.
With this construction, the exposures have one confounder in common, C<sub>2</sub>, and one independent confounder. The effect size of the confounders vary for each exposure. We assume that the conditional distribution of D given C is bivariate normal with conditional correlation equal to 0.2 and conditional variance equal to 2.
To generate the set of confounders and the corresponding bivariate
exposure we can use the function gen_D() as shown below.
require(mvGPS)
sim_dt <- gen_D(method="u", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1,
d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C
By construction our marginal correlation of D is a function of parameters from the distribution of C, coefficients of conditional mean equations, and conditional covariance parameter. For the above specification the true marginal correlation of exposure is equal to 0.24 and our observed marginal correlation is equal to 0.26.
Finally, we specify our outcome, Y, as a linear combination of the confounders and exposure. The mean of the dose-response equation is shown below,
E[Y|D, C]=0.75C<sub>1</sub>+1C<sub>2</sub>+0.6C<sub>3</sub>+D<sub>1</sub>+D<sub>2</sub>.
Both exposures have treatment effect sizes equal to one. The standard deviation of our outcome is set equal 2.
alpha <- c(0.75, 1, 0.6, 1, 1)
sd_Y <- 2
X <- cbind(C, D)
Y <- X%*%alpha + rnorm(200, sd=sd_Y)
Generating Weights
With the data generated, we can now use our primary function mvGPS()
to estimate weights. These weights are constructed such that the
numerator is equal to the marginal density, with the denominator
corresponding to the conditional density, i.e., the multivariate
generalized propensity score.
In our case since the bivariate exposure is assumed to be bivariate normal, we can break both the numerator and denominator into full conditional densities knowing that each univariate conditional expression will remain normally distributed.
Notice in the equation above, we are also able to specify the confounding set for each exposure separately.
require(mvGPS)
out_mvGPS <- mvGPS(D=D, C=list(C[, 1:2], C[, 2:3]))
w <- out_mvGPS$w
This vector w now can be used to test balance of confounders by
comparing weighted vs. unweighted correlations and to estimate the
treatment effects using weighted least squares regression.
Balance Assessment
For continuous exposure(s) we can asses balance using several metrics such as euclidean distance, maximum absolute correlation, and average absolute correlation where correlation refers to the Pearson correlation between exposure and covariate.
Below we use the function bal() to specify a set of potential models
to use for comparison. Possible models that are available include:
mvGPS, Entropy, CBPS, GBM, and PS. For methods other than mvGPS which
can only estimate univariate continuous exposure, each exposure is fit
separately so that weights are generated for both exposures.
require(knitr)
bal_results <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D, C=list(C[, 1:2], C[, 2:3]))
bal_summary <- bal_results$bal_metrics
#contains overall summary statistics with respect to balance
bal_summary <-data.frame(bal_summary, ESS=c(bal_results$ess, nrow(D)))
#adding in ESS with last value representing the unweighted case
bal_summary <- bal_summary[order(bal_summary$max_cor), ]
kable(bal_summary[, c("euc_dist", "max_cor", "avg_cor", "ESS", "method")],
digits=4, row.names=FALSE,
col.names=c("Euc. Distance", "Max. Abs. Corr.",
"Avg. Abs. Corr.", "ESS", "Method"))
<table>
<thead>
<tr>
<th style="text-align:right;">
Euc. Distance
</th> <th style="text-align:right;">Max. Abs. Corr.
</th> <th style="text-align:right;">Avg. Abs. Corr.
</th> <th style="text-align:right;">ESS
</th> <th style="text-align:left;">Method
</th> </tr> </thead> <tbody> <tr> <td style="text-align:right;">0.0930
</td> <td style="text-align:right;">0.0884
</td> <td style="text-align:right;">0.0331
</td> <td style="text-align:right;">163.8253
</td> <td style="text-align:left;">mvGPS
</td> </tr> <tr> <td style="text-align:right;">0.2568
</td> <td style="text-align:right;">0.2145
</td> <td style="text-align:right;">0.1137
</td> <td style="text-align:right;">159.9284
</td> <td style="text-align:left;">GBM_D1
</td> </tr> <tr> <td style="text-align:right;">0.2592
</td> <td style="text-align:right;">0.2288
</td> <td style="text-align:right;">0.1085
</td> <td style="text-align:right;">179.8179
</td> <td style="text-align:left;">PS_D1
</td> </tr> <tr> <td style="text-align:right;">0.3095
</td> <td style="text-align:right;">0.2321
</td> <td style="text-align:right;">0.1092
</td> <td style="text-align:right;">178.8683
</td> <td style="text-align:left;">entropy_D2
</td> </tr> <tr> <td style="text-align:right;">0.2400
</td> <td style="text-align:right;">0.2336
</td> <td style="text-align:right;">0.0721
</td> <td style="text-align:right;">172.2635
</td> <td style="text-align:left;">entropy_D1
</td> </tr> <tr> <td style="text-align:right;">0.2843
</td> <td style="text-align:right;">0.2400
</td> <td style="text-align:right;">0.1236
</td> <td style="text-align:right;">184.3871
</td> <td style="text-align:left;">CBPS_D1
</td> </tr> <tr> <td style="text-align:right;">0.3185
</td> <td style="text-align:right;">0.2418
</td> <td style="text-align:right;">0.1227
</td> <td style="text-align:right;">180.3586
</td> <td style="text-align:left;">PS_D2
</td> </tr> <tr> <td style="text-align:right;">0.3285
</td> <td style="text-align:right;">0.2502
</td> <td style="text-align:right;">0.1219
</td> <td style="text-align:right;">142.8799
</td> <td style="text-align:left;">GBM_D2
</td> </tr> <tr> <td style="text-align:right;">0.5009
</td> <td style="text-align:right;">0.3168
</td> <td style="text-align:right;">0.2421
</td> <td style="text-align:right;">200.0000
</td> <td style="text-align:left;">unweighted
</td> </tr> <tr> <td style="text-align:right;">1.1022
</td> <td style="text-align:right;">0.6701
</td> <td style="text-align:right;">0.5012
</td> <td style="text-align:right;">50.2385
</td> <td style="text-align:left;">CBPS_D2
</td> </tr> </tbody> </table>We can see that our method mvGPS achieves the best balance across both
exposure dimensions. In this case we can also note that the effective
sample size after weighting 163.8253 is still sufficiently large that we
not worried about loss of power.
Bias Reduction
Finally, we want to check that these weights are properly reducing the bias when we estimate the exposure treatment effect.
dt <- data.frame(Y, D)
mvGPS_mod <- lm(Y ~ D1 + D2, weights=w, data=dt)
mvGPS_hat <- coef(mvGPS_mod)[c("D1", "D2")]
unadj_hat <- coef(lm(Y ~ D1 + D2, data=dt))[c("D1", "D2")]
bias_tbl <- cbind(truth=c(1, 1), unadj=unadj_hat, mvGPS_hat)
kable(bias_tbl, digits=2, row.names=TRUE,
col.names=c("Truth", "Unadjusted", "mvGPS"))
<table>
<thead>
<tr>
<th style="text-align:left;">
</th>
<th stylRelated Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
