Brglm2
Estimation and inference from generalized linear models using explicit and implicit methods for bias reduction
Install / Use
/learn @ikosmidis/Brglm2Quality Score
Category
Development & EngineeringSupported Platforms
Tags
README
brglm2 <img src="man/figures/hex_brglm2.svg" width="320" align="right">
<!-- badges: start --> <!-- badges: end -->brglm2 provides tools for the estimation and inference from generalized linear models using various methods for bias reduction. brglm2 supports all generalized linear models supported in R, and provides methods for multinomial logistic regression (nominal responses), adjacent category models (ordinal responses), and negative binomial regression (for potentially overdispered count responses).
Reduction of estimation bias is achieved by solving either the mean-bias reducing adjusted score equations in Firth (1993) and Kosmidis & Firth (2009) or the median-bias reducing adjusted score equations in Kenne et al (2017), or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as prescribed in Cordeiro and McCullagh (1991). Kosmidis et al (2020) provides a unifying framework and algorithms for mean and median bias reduction for the estimation of generalized linear models.
In the special case of generalized linear models for binomial and multinomial responses (both ordinal and nominal), the adjusted score equations return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation). See, Kosmidis & Firth (2021) for the proof of the latter result in the case of mean bias reduction for logistic regression (and, for more general binomial-response models where the likelihood is penalized by a power of the Jeffreys’ invariant prior).
For logistic regression, brglm2 also provides methods for maximum Diaconis-Ylvisaker prior penalized likelihood (MDYPL) estimation, and corresponding methods for high-dimensionality corrections of the aggregate bias of the estimator and the usual statistics used for inference; see Sterzinger and Kosmidis, 2024.
The core model fitters are implemented by the functions brglm_fit()
(univariate generalized linear models) and mdyplFit() (logistic
regression), and brmultinom() (baseline category logit models for
nominal multinomial responses), bracl() (adjacent category logit
models for ordinal multinomial responses), and brnb() (negative
binomial regression).
Installation
Install the current version from CRAN:
install.packages("brglm2")
or the development version from github:
# install.packages("remotes")
remotes::install_github("ikosmidis/brglm2", ref = "develop")
Examples
Estimation of binomial-response GLMs with separated data
Below we follow the example of Heinze and Schemper
(2002) and fit a logistic regression
model using maximum likelihood (ML) to analyze data from a study on
endometrial cancer (see ?brglm2::endometrial for details and
references).
library("brglm2")
data("endometrial", package = "brglm2")
modML <- glm(HG ~ NV + PI + EH, family = binomial("logit"), data = endometrial)
summary(modML)
#>
#> Call:
#> glm(formula = HG ~ NV + PI + EH, family = binomial("logit"),
#> data = endometrial)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 4.30452 1.63730 2.629 0.008563 **
#> NV 18.18556 1715.75089 0.011 0.991543
#> PI -0.04218 0.04433 -0.952 0.341333
#> EH -2.90261 0.84555 -3.433 0.000597 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 104.903 on 78 degrees of freedom
#> Residual deviance: 55.393 on 75 degrees of freedom
#> AIC: 63.393
#>
#> Number of Fisher Scoring iterations: 17
The ML estimate of the parameter for NV is actually infinite, as can
be quickly verified using the
detectseparation
R package
# install.packages("detectseparation")
library("detectseparation")
#>
#> Attaching package: 'detectseparation'
#> The following objects are masked from 'package:brglm2':
#>
#> check_infinite_estimates, detect_separation
update(modML, method = detect_separation)
#> Implementation: ROI | Solver: lpsolve
#> Separation: TRUE
#> Existence of maximum likelihood estimates
#> (Intercept) NV PI EH
#> 0 Inf 0 0
#> 0: finite value, Inf: infinity, -Inf: -infinity
The reported, apparently finite estimate
r round(coef(summary(modML))["NV", "Estimate"], 3) for NV is merely
due to false convergence of the iterative estimation procedure for ML.
The same is true for the estimated standard error, and, hence the value
0.011 for the z-statistic cannot be trusted for inference on the
effect size for NV.
As mentioned earlier, many of the estimation methods implemented in brglm2 not only return estimates with improved frequentist properties (e.g. asymptotically smaller mean and median bias than what ML typically delivers), but also estimates and estimated standard errors that are always finite in binomial (e.g. logistic, probit, and complementary log-log regression) and multinomial regression models (e.g. baseline category logit models for nominal responses, and adjacent category logit models for ordinal responses). For example, the code chunk below refits the model on the endometrial cancer study data using mean bias reduction.
summary(update(modML, method = "brglm_fit"))
#>
#> Call:
#> glm(formula = HG ~ NV + PI + EH, family = binomial("logit"),
#> data = endometrial, method = "brglm_fit")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.4740 -0.6706 -0.3411 0.3252 2.6123
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 3.77456 1.48869 2.535 0.011229 *
#> NV 2.92927 1.55076 1.889 0.058902 .
#> PI -0.03475 0.03958 -0.878 0.379915
#> EH -2.60416 0.77602 -3.356 0.000791 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 104.903 on 78 degrees of freedom
#> Residual deviance: 56.575 on 75 degrees of freedom
#> AIC: 64.575
#>
#> Type of estimator: AS_mixed (mixed bias-reducing adjusted score equations)
#> Number of Fisher Scoring iterations: 6
A quick comparison of the output from mean bias reduction to that from
ML reveals a dramatic change in the z-statistic for NV, now that
estimates and estimated standard errors are finite. In particular, the
evidence against the null of NV not contributing to the model in the
presence of the other covariates being now stronger.
See ?brglm_fit and ?brglm_control for more examples and the other
estimation methods for generalized linear models, including median bias
reduction and maximum penalized likelihood with Jeffreys’ prior penalty.
Also do not forget to take a look at the vignettes
(vignette(package = "brglm2")) for details and more case studies.
Improved estimation of the exponential of regression parameters
See, also ?expo for a method to estimate the exponential of regression
parameters, such as odds ratios from logistic regression models, while
controlling for other covariate information. Estimation can be performed
using maximum likelihood or various estimators with smaller asymptotic
mean and median bias, that are also guaranteed to be finite, even if the
corresponding maximum likelihood estimates are infinite. For example,
modML is a logistic regression fit, so the exponential of each
coefficient is an odds ratio while controlling for other covariates. To
estimate those odds ratios using the correction* method for mean bias
reduction (see ?expo for details) we do
expoRB <- expo(modML, type = "correction*")
expoRB
#>
#> Call:
#> expo.glm(object = modML, type = "correction*")
#>
#> Odds ratios
#> Estimate Std. Error 2.5 % 97.5 %
#> (Intercept) 20.671820 33.136501 0.893141 478.451
#> NV 8.496974 7.825239 1.397511 51.662
#> PI 0.965089 0.036795 0.895602 1.040
#> EH 0.056848 0.056344 0.008148 0.397
#>
#>
#> Type of estimator: correction* (explicit mean bias correction with a multiplicative adjustment)
The odds ratio between presence of neovasculation and high histology
grade (HG) is estimated to be 8.497, while controlling for PI and E
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
