AIPW
R Package: Augmented Inverse Probability Weighted (AIPW) Estimation for Average Causal Effect
Install / Use
/learn @yqzhong7/AIPWREADME
AIPW: Augmented Inverse Probability Weighting
<!-- badges: start --> <!-- badges: end --> <!-- README.md is generated from README.Rmd. Please edit that file -->Contributors: Yongqi Zhong, Ashley Naimi, Gabriel Conzuelo, Edward Kennedy
Augmented inverse probability weighting (AIPW) is a doubly robust
estimator for causal inference. The AIPW package is designed for
estimating the average treatment effect of a binary exposure on risk
difference (RD), risk ratio (RR) and odds ratio (OR) scales with
user-defined stacked machine learning algorithms
(SuperLearner or
sl3). Users need to examine causal
assumptions (e.g., consistency) before using this package.
If you find this package is helpful, please consider to cite:
@article{zhong_aipw_2021,
author = {Zhong, Yongqi and Kennedy, Edward H and Bodnar, Lisa M and Naimi, Ashley I},
title = {AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects},
journal = {American Journal of Epidemiology},
year = {2021},
month = {07},
issn = {0002-9262},
doi = {10.1093/aje/kwab207},
url = {https://doi.org/10.1093/aje/kwab207},
}
Contents:
-
Updates
-
Installation
-
Example
-
Repeated Fitting
-
Parallelization and progress bar
-
Use tmle/tmle3 as input
-
References
<a id="Updates"></a>Updates
2025-04-05 Updates
Repeated Cross-fitting
The major new feature introduced is the Repeated class, which allows
for repeated cross-fitting procedures to mitigate randomness due to data
splits in machine learning-based estimation as suggested by Chernozhukov
et al. (2018). This feature: - Enables running the cross-fitting
procedure multiple times to produce more stable estimates - Provides
methods to summarize results using median-based approaches - Supports
parallelization with future.apply - Includes visualization of estimate
distributions across repetitions - See the Repeated Cross-fitting
vignette for more details
Continuous Outcome Support Improvements
- Fixed handling of continuous outcomes for exposure models (#50)
- Improved handling of non-binary treatments
- Fixed Q.model for continuous outcomes
Infrastructure Improvements
- Updated GitHub Actions workflows for R-CMD-check, test coverage, and pkgdown
- Removed Travis CI in favor of GitHub Actions
- Enhanced test coverage with additional tests for the new Repeated class
- Updated documentation and namespace for new functionality
Support Changes
- New GitHub versions (after v0.6.3.1) no longer support sl3 and tmle3
- Users requiring sl3 and tmle3 support should install via
remotes::install_github("yqzhong7/AIPW@aje_version")
Bug Fixes
- Fixed repeated fitting when stratified_fit is enabled
- Fixed handling of Q.model
- Added proper error handling for various edge cases
- Fixed continuous outcome for exposure model
- Improved cross-fitting to reduce randomness (#38)
<a id="Installation"></a>Installation
CRAN version
install.packages("AIPW")
Github version
install.packages("remotes")
remotes::install_github("yqzhong7/AIPW")
* CRAN version only supports SuperLearner and tmle. New GitHub
versions (after v0.6.3.1) no longer support sl3 and tmle3. If you are
still interested in using the version with sl3 and tmle3 support, please
install remotes::install_github("yqzhong7/AIPW@aje_version") <s>Please
install the Github version (master branch) if you choose to use sl3 and
tmle3.</s>
<a id="Example"></a>Example
<a id="data"></a>Setup example data
set.seed(888)
data("eager_sim_obs")
outcome <- eager_sim_obs$sim_Y
exposure <- eager_sim_obs$sim_A
#covariates for both outcome model (Q) and exposure model (g)
covariates <- as.matrix(eager_sim_obs[-1:-2])
# covariates <- c(rbinom(N,1,0.4)) #a vector of a single covariate is also supported
<a id="one_line"></a>One line version (AIPW class: method chaining from R6class)
library(AIPW)
library(SuperLearner)
#> Loading required package: nnls
#> Loading required package: gam
#> Loading required package: splines
#> Loading required package: foreach
#> Loaded gam 1.20.2
#> Super Learner
#> Version: 2.0-28
#> Package created on 2021-05-04
library(ggplot2)
AIPW_SL <- AIPW$new(Y = outcome,
A = exposure,
W = covariates,
Q.SL.library = c("SL.mean","SL.glm"),
g.SL.library = c("SL.mean","SL.glm"),
k_split = 3,
verbose=FALSE)$
fit()$
#Default truncation
summary(g.bound = 0.025)$
plot.p_score()$
plot.ip_weights()
<!-- -->
<!-- -->
To see the results, set verbose = TRUE(default) or:
print(AIPW_SL$result, digits = 2)
#> Estimate SE 95% LCL 95% UCL N
#> Risk of Exposure 0.44 0.046 0.3528 0.53 118
#> Risk of Control 0.31 0.051 0.2061 0.41 82
#> Risk Difference 0.14 0.068 0.0048 0.27 200
#> Risk Ratio 1.45 0.191 0.9974 2.11 200
#> Odds Ratio 1.81 0.295 1.0144 3.22 200
To obtain average treatment effect among the treated/controls (ATT/ATC),
statified_fit() must be used:
AIPW_SL_att <- AIPW$new(Y = outcome,
A = exposure,
W = covariates,
Q.SL.library = c("SL.mean","SL.glm"),
g.SL.library = c("SL.mean","SL.glm"),
k_split = 3,
verbose=T)
suppressWarnings({
AIPW_SL_att$stratified_fit()$summary()
})
#> Done!
#> Estimate SE 95% LCL 95% UCL N
#> Risk of Exposure 0.4352 0.0467 0.34362 0.527 118
#> Risk of Control 0.3244 0.0513 0.22385 0.425 82
#> Risk Difference 0.1108 0.0684 -0.02320 0.245 200
#> Risk Ratio 1.3416 0.1858 0.93210 1.931 200
#> Odds Ratio 1.6048 0.2927 0.90429 2.848 200
#> ATT Risk Difference 0.0991 0.0880 -0.07339 0.272 200
#> ATC Risk Difference 0.1148 0.0634 -0.00946 0.239 200
You can also use the aipw_wrapper() to wrap new(), fit() and
summary() together (also support method chaining):
AIPW_SL <- aipw_wrapper(Y = outcome,
A = exposure,
W = covariates,
Q.SL.library = c("SL.mean","SL.glm"),
g.SL.library = c("SL.mean","SL.glm"),
k_split = 3,
verbose=TRUE,
stratified_fit=F)$plot.p_score()$plot.ip_weights()
<a id="rep"></a>Repeated Fitting
The Repeated class allows for repeated cross-fitting procedures to
mitigate randomness due to data splits. This approach is recommended in
machine learning-based estimation as suggested by Chernozhukov et
al. (2018).
library(SuperLearner)
library(ggplot2)
# First create a regular AIPW object
aipw_obj <- AIPW$new(Y = outcome,
A = exposure,
W = covariates,
Q.SL.library = c("SL.mean","SL.glm"),
g.SL.library = c("SL.mean","SL.glm"),
k_split = 3,
verbose = FALSE)
# Create a repeated fitting object from the AIPW object
repeated_aipw <- Repeated$new(aipw_obj)
# Perform repeated fitting 20 times
repeated_aipw$repfit(num_reps = 20, stratified = FALSE)
# Summarize results using median-based methods
repeated_aipw$summary_median()
# You can also visualize the distribution of estimates across repetitions
estimates_df <- repeated_aipw$repeated_estimates
ggplot(estimates_df, aes(x = Estimate, fill = Estimand)) +
geom_density(alpha = 0.5) +
theme_minimal() +
labs(title = "Distribution of Estimates Across Repeated Fittings",
subtitle = "Based on 20 repetitions",
x = "Estimate Value",
y = "Density")
Setting stratified = TRUE in the repfit() function will use the
stratified fitting procedure for each repetition:
# Using stratified fitting
repeated_aipw_strat <- Repeated$new(aipw_obj)
repeated_aipw_strat$repfit(num_reps = 20, stratified = TRUE)
repeated_aipw_strat$summary_median()
Note that the Repeated class also supports parallelization with
future.apply as described below.
<a id="par"></a>Parallelization with future.apply and progress bar with progressr
In default setting, the AIPW$fit() method will be run sequentially.
The current version of AIPW package supports parallel processing
implemented by
[future.apply](https://g
