Jive
Jackknife Instrumental Variables Estimation
Install / Use
/learn @kylebutts/JiveREADME
[🚧 WIP 🚧] jive
<!-- badges: start -->
<!-- badges: end -->
The goal of jive is to implement jackknife instrumental-variable estimators (JIVE) and various alternatives.
Installation
You can install the development version of jive like so:
remotes::install_github("kylebutts/jive")
This package requires sparse_model_matrix from the dev version of
fixest. You can install that via
remotes::install_github("lrberge/fixest")
Example Usage
We are going to use the data from Stevenson (2018). Stevenson leverages the quasi-random assignment of 8 judges (magistrates) in Philadelphia to study the effects pretrial detention on several outcomes, including whether or not a defendant subsequently pleads guilty.
library(jive)
#> Loading required package: fixest
data(stevenson)
Juke n’ JIVE
jive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson
)
#> Coefficients:
#> Estimate Robust SE Z value Pr(>z)
#> jail3 -0.0218460 -0.0075176 2.906 0.003661 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
ujive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson
)
#> Coefficients:
#> Estimate Robust SE Z value Pr(>z)
#> jail3 0.159077 0.070567 2.2543 0.02418 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
ijive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson
)
#> Coefficients:
#> Estimate Robust SE Z value Pr(>z)
#> jail3 0.159527 0.070533 2.2617 0.02371 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
# Leave-cluster out
ijive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson,
cluster = ~bailDate,
lo_cluster = TRUE # Default, but just to be explicit
)
#> Coefficients:
#> Estimate Clustered SE Z value Pr(>z)
#> jail3 0.174206 0.073553 2.3685 0.01786 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
(Leave-out) Leniency Measures
The package will allow you to estimate (leave-out) leniency measures:
out = ijive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson,
return_leniency = TRUE
)
stevenson$judge_lo_leniency = out$That
hist(stevenson$judge_lo_leniency, breaks = 30, xlab = "Judge leave-one-out leniency", main = NULL)
<img src="man/figures/README-estimate-leniency-1.png" width="100%" />
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.2.9000 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.4
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
judge_summary <- stevenson |>
summarize(
.by = judge_pre,
judge_leniency = mean(judge_lo_leniency),
prop_jail3 = mean(jail3),
prop_guilt = mean(guilt),
)
# First-stage plot
ggplot(judge_summary, aes(x = judge_leniency, y = prop_jail3)) +
geom_point() +
# using `lm` because we have so few judges in our dataset
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "ribbon",
color = "#e64173",
fill = NA,
linetype = "dashed",
linewidth = 1.25
) +
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "line",
color = "#e64173",
linewidth = 1.25
) +
labs(title = "First-stage", x = "Judge leniency", y = "Judge pre-trial detention rate") +
theme_bw()
<img src="man/figures/README-plot reduced form and first stage-1.png" width="100%" />
# Reduced-form plot
ggplot(judge_summary, aes(x = judge_leniency, y = prop_guilt)) +
geom_point() +
# using `lm` because we have so few judges in our dataset
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "ribbon",
color = "#e64173",
fill = NA,
linetype = "dashed",
linewidth = 1.25
) +
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "line",
color = "#e64173",
linewidth = 1.25
) +
labs(title = "Reduced Form", x = "Judge leniency", y = "Judge guilty verdict rate") +
theme_bw()
<img src="man/figures/README-plot reduced form and first stage-2.png" width="100%" />
library(tidyverse)
library(fixest)
# Take residuals from first-stage but add back in judge fixed effects
# This is what Dobbie, Goldin, and Yang do in Figure 1
est_fs <- feols(
jail3 ~ 0 + i(black) + i(white) | judge_pre + bailDate,
data = stevenson
)
stevenson$resid <- resid(est_fs) +
predict(est_fs, fixef = TRUE)[, "judge_pre"]
# First-stage plot
ggplot(stevenson, aes(x = judge_lo_leniency, y = resid)) +
stat_smooth(
geom = "ribbon",
method = "lm",
formula = y ~ x,
color = "#e64173",
fill = NA,
linetype = "dashed",
linewidth = 1.25
) +
stat_smooth(
geom = "line",
method = "lm",
formula = y ~ x,
color = "#e64173",
linewidth = 1.25
) +
labs(
title = "First-stage",
y = "Residualized rate of pretrial release",
x = "Judge Leniency (Leave-out measure)"
) +
theme_bw()
<img src="man/figures/README-unnamed-chunk-3-1.png" width="100%" />
Econometric Details on JIVE, UJIVE, IJIVE, and CJIVE
Consider the following instrumental variables setup
$$ y_i = T_i \beta + W_i' \psi + \varepsilon_i, $$
where $T_i$ is a scalar endogenous variable, $W_i$ is a vector of exogenous covariates, and $\varepsilon_i$ is an error term. Additionally, assume there is a set of valid instruments, $Z_i$. The intuition of two-stage least squares is to use the instruments to “predict” $T_i$:
$$ T_i = Z_i' \pi + W_i \gamma + \eta_i. $$
Then, the prediction, $\hat{T}_i$, is used in place of $T_i$ in the original regression.
When the dimension of $Z_i$ grows with the number of observations, two-stage least squares is biased (Kolesar, 2013). Without getting into the details, the problem comes because you’re predicting $T_i$ using $i$’s own observation in the first-stage. For this reason, Angrist, Imbens, and Krueger (1999) developed the jackknife instrumental-variable estimator (JIVE). In short, for each $i$, $T_i$ is predicted using the first-stage equation leaving out $i$’s own observation.
In general, the JIVE estimator (and variants) are given by
$$ \frac{\hat{P}' Y}{\hat{P}' T} $$
where $\hat{P}$ is a function of $W$, $T$, and $Z$. The particulars differ across the JIVE, the unbiased JIVE (UJIVE), the improved JIVE (IJIVE), and the cluster JIVE (CJIVE).
JIVE definition
Source: Kolesar (2013) and Angrist, Imbens, and Kreuger (1999)
The original JIVE estimate produces $\hat{T}$ via a leave-out procedure, which can be expressed in matrix notation as:
$$ \hat{T}_{JIVE} = (I - D_{(Z,W)})^{-1} (H_{(Z,W)} - D_{(Z,W)}) T, $$
where $H_{(Z,W)}$ is the hat/projection matrix for $(Z,W)$ and $D_{(Z,W)}$ is the diagonal matrix with diagonal elements corresponding to $H_{(Z,W)}$. Then, after partialling out covariates in the second-stage, we have
$$ \hat{P}_{JIVE} = M_W \hat{T}_{JIVE} $$
UJIVE definition
Source: Kolesar (2013)
For UJIVE, a leave-out procedure is used in the first-stage for fitted values $\hat{T}$ and in the second stage for residualizing the covariates. The terms are given by:
$$ \hat{T}_{UJIVE} = (I - D_{(Z,W)})^{-1} (H_{(Z,W)} - D_{(Z,W)}) T = \hat{T}_{JIVE} $$
$$ \hat{P}_{UJIVE} = \hat{T}_{UJIVE} - (I - D_{W})^{-1} (H_{W} - D_{W}) T $$
IJIVE definition
Source: Ackerberg and Devereux (2009)
The IJIVE procedure, first residualizes $T$, $Y$, and $Z$ by the covariates $W$. The authors show that this reduces small-sample bias. Then, the standard leave-out JIVE procedure is carried out on the residualized matrices (denoted by $):
$$ \hat{T}_{IJIVE} = \hat{P}_{IJIVE} = (I - D_{\tilde{Z}})^{-1} (H_{\tilde{Z}} - D_{\tilde{Z}}) \tilde{T} $$
Note that $\hat{P}_{IJIVE} = \hat{T}_{IJIVE}$ because the residualization has already occured and doesn’t need to occur in the second stage.
CJIVE definition
Source: Frandsen, Leslie, and McIntyre (2023)
This is a modified version of IJIVE as proposed by Frandsen, Leslie, and McIntyre (2023). This is necessary if the errors are correlated within clusters (e.g. court cases assigned on the same day to the same judge). The modified version is given by:
$$ \hat{T}_{CJIVE} = \hat{P}_{CJIVE} = (I - \mathbb{D}(P_Z, { n_1, \dots, n_G }))^{-1} (H_{\tilde{Z}} - \mathbb{D}(P_Z, { n_1, \dots, n_G })) \tilde{T}, $$
where $\mathbb{D}(P_Z, { n_1, \dots, n_G })$ is a block-diagonal matrix equal to the projection matrix of $Z$ zerod out except for each cluster.
In this package, the same adjustment (replacing the diagonal $D$ with the cluster block-diagonal $\mathbb{D}$ version) can be done for all three esti
