Tidybayes
Bayesian analysis + tidy data + geoms (R package)
Install / Use
/learn @mjskay/TidybayesREADME
tidybayes: Bayesian analysis + tidy data + geoms
<figure> <img src="man/figures/preview.gif" alt="Preview of tidybayes plots" /> <figcaption aria-hidden="true">Preview of tidybayes plots</figcaption> </figure>tidybayes is an R package that aims to make it easy to integrate popular Bayesian modeling methods into a tidy data + ggplot workflow. It builds on top of (and re-exports) several functions for visualizing uncertainty from its sister package, ggdist
Tidy data frames (one
observation per row) are particularly convenient for use in a variety of
R data manipulation and visualization packages. However, when using
Bayesian modeling functions like JAGS or Stan in R, we often have to
translate this data into a form the model understands, and then after
running the model, translate the resulting sample (or predictions) into
a more tidy format for use with other R functions. tidybayes aims to
simplify these two common (often tedious) operations:
-
Composing data for use with the model. This often means translating data from a
data.frameinto alist, making surefactorsare encoded as numerical data, adding variables to store the length of indices, etc. This package helps automate these operations using thecompose_data()function, which automatically handles data types likenumeric,logical,factor, andordinal, and allows easy extensions for converting other data types into a format the model understands by providing your own implementation of the genericas_data_list(). -
Extracting tidy draws from the model. This often means extracting indices from parameters with names like
"b[1,1]","b[1,2]"into separate columns of a data frame, likei = c(1,1,..)andj = c(1,2,...). More tediously, sometimes these indices actually correspond to levels of a factor in the original data; e.g."x[1]"might correspond to a value ofxfor the first level of some factor. We provide several straightforward ways to convert draws from a variable with indices into useful long-format (“tidy”) data frames, with automatic back-conversion of common data types (factors, logicals) using thespread_draws()andgather_draws()functions, including automatic recovery of factor levels corresponding to variable indices. In most cases this kind of long-format data is much easier to use with other data-manipulation and plotting packages (e.g.,dplyr,tidyr,ggplot2) than the format provided by default from the model. Seevignette("tidybayes")for examples.
tidybayes also provides some additional functionality for data
manipulation and visualization tasks common to many models:
-
Extracting tidy fits and predictions from models. For models like those provided by
rstanarmandbrms,tidybayesprovides a tidy analog of theposterior_epred(),posterior_predict(), andposterior_linpred()functions, calledadd_epred_draws(),add_predicted_draws(), andadd_linpred_draws(). These functions are modeled after themodelr::add_predictions()function, and turn a grid of predictions into a long-format data frame of draws from either the fits or predictions from a model. These functions make it straightforward to generate arbitrary fit lines from a model. Seevignette("tidy-brms")orvignette("tidy-rstanarm")for examples. -
Summarizing posterior distributions from models.
tidybayesre-exports theggdist::point_interval()family of functions (median_qi(),mean_qi(),mode_hdi(), etc), which are methods for generating point summaries and intervals that are designed with tidy workflows in mind. They can generate point summaries plus an arbitrary number of probability intervals from tidy data frames of draws, they return tidy data frames, and they respect data frame groups.tidybayesalso provides and implementation ofposterior::summarise_draws()for use with grouped data frames, such as those returned by thetidybayes::XXX_drawsfunctions. -
Visualizing priors and posteriors. The focus on tidy data makes the output from tidybayes easy to visualize using
<figure> <img src="man/figures/slabinterval_family.png" alt="The slabinterval family of geoms and stats" /> <figcaption aria-hidden="true">The slabinterval family of geoms and stats</figcaption> </figure>ggplot. While existinggeoms (likeggdist::geom_pointrange()andggdist::geom_linerange()) can give useful output, the output fromtidybayesis designed to work well with several geoms and stats in its sister package,ggdist. These geoms have sensible defaults suitable for visualizing posterior point summaries and intervals (ggdist::geom_pointinterval(),ggdist::stat_pointinterval()), visualizing distributions with point summaries and intervals (theggdist::stat_sample_slabinterval()family of stats, including eye plots, half-eye plots, CCDF bar plots, gradient plots, dotplots, and histograms), and visualizing fit lines with an arbitrary number of uncertainty bands (ggdist::geom_lineribbon()andggdist::stat_lineribbon()). Priors can also be visualized in the same way using theggdist::stat_slabinterval()family of stats. Theggdist::geom_dotsinterval()family also automatically finds good binning parameters for dotplots, and can be used to easily construct quantile dotplots of posteriors (see example in this document). For convenience,tidybayesre-exports theggdiststats and geoms.See
vignette("slabinterval", package = "ggdist")for more information. -
Extracting and visualizing data frames of random variables from models.
tidybayesalso providesXXX_rvarsfunctions as alternatives to theXXX_drawsfunctions, such asspread_rvars(),add_predicted_rvars(), etc. These functions instead return tidy data frames ofposterior::rvar()s, a vectorized random variable data type (seevignette("rvar", package = "posterior")for more aboutrvars). Combined with theggdist::stat_slabinterval()andggdist::stat_lineribbon()geometries, these functions make it easy to extract samples from distributions, manipulate them, and visualize them; this format may have significant advantages in terms of memory required for large models. Seevignette("tidy-posterior")for examples. -
Comparing a variable across levels of a factor, which often means first generating pairs of levels of a factor (according to some desired set of comparisons) and then computing a function over the value of the comparison variable for those pairs of levels. Assuming your data is in the format returned by
spread_draws, thecompare_levelsfunction allows comparison across levels to be made easily.
Finally, tidybayes aims to fit into common workflows through
compatibility with other packages:
-
Its core functions for returning tidy data frames of draws are built on top of
posterior::as_draws_df(). -
Drop-in functions to translate tidy column names used by
tidybayesto/from names used by other common packages and functions, including column names used byggmcmc::ggs(viato_ggmcmc_namesandfrom_ggmcmc_names) and column names used bybroom::tidy(viato_broom_namesandfrom_broom_names), which makes comparison with results of other models straightforward. -
The
unspread_drawsandungather_drawsfunctions invertspread_drawsandgather_draws, aiding compatibility with other Bayesian plotting packages (notablybayesplot). -
The
gather_emmeans_drawsfunction turns the output fromemmeans::emmeans(formerlylsmeans) into long-format data frames (when applied to supported model types, likeMCMCglmmandrstanarmmodels).
Supported model types
tidybayes aims to support a variety of models with a uniform
interface. Currently supported models include
rstan,
cmdstanr,
brms,
rstanarm,
runjags,
rjags,
jagsUI, coda::mcmc and
coda::mcmc.list,
posterior::draws,
MCMCglmm, and anything
with its own as.mcmc.list implementation. If you install the
tidybayes.rethinking
package, models from the
rethinking package are also
supported.
Installation
You can install the currently-released version from CRAN with this R command:
install.packages("tidybayes")
Alternatively, you can install the latest development version from GitHub with these R commands:
install.packages("devtools")
devtools::install_github("mjskay/tidybayes")
Examples
This example shows the use of tidybayes with the Stan modeling language; however, tidybayes supports many other model types, such as JAGS, brm, rstanarm, and (theoretica
