Instantiate
Pre-compiled CmdStan models in R packages
Install / Use
/learn @wlandau/InstantiateREADME
instantiate: pre-compiled CmdStan models in R packages
Similar to rstantools for
rstan, the instantiate package builds
pre-compiled CmdStan
models into CRAN-ready statistical modeling R packages. The models
compile once during installation, the executables live inside the file
systems of their respective packages, and users have the full power and
convenience of CmdStanR without any
additional compilation after package installation. This approach saves
time and helps R package developers migrate from
rstan to the more modern
CmdStanR.
Documentation
The website at https://wlandau.github.io/instantiate/ includes a function reference and other documentation.
Installing instantiate
The instantiate package depends on the R package
CmdStanR and the command line tool
CmdStan, so it is
important to follow these stages in order:
- Install the R package
CmdStanR.CmdStanRis not on CRAN, so the recommended way to install it isinstall.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos"))). - Optional: set environment variables
CMDSTAN_INSTALLand/orCMDSTANto manage theCmdStaninstallation. See the “Administering CmdStan” section below for details. - Install
instantiateusing one of the R commands below.
| Type | Source | Command |
|----|----|----|
| Release | CRAN | install.packages("instantiate") |
| Development | GitHub | remotes::install_github("wlandau/instantiate") |
| Development | R-universe | install.packages("instantiate", repos = "https://wlandau.r-universe.dev") |
Installing packages that use instantiate
Packages that use instantiate may be published on CRAN. CRAN does not
have CmdStan, so the models are not pre-compiled in the Mac OS and
Windows binaries. If you install from CRAN, please install from the
source. For example:
install.packages("hdbayes", type = "source")
Environment variables
The instantiate package uses environment variables to manage the
installation of
CmdStan. An
environment variable is an operating system setting with a name and a
value (both text strings). In R, there are two ways to set environment
variables:
Sys.setenv(), which sets environment variables temporarily for the current R session.- The
.Renvirontext file in you home directory, which passes environment variables to all new R sessions. theedit_r_environ()function from theusethispackage helps.
Administering CmdStan
By default, instantiate looks for the copy of
CmdStan located at
cmdstanr::install_cmdstan(). If you upgrade
CmdStan, then the path
returned by cmdstanr::install_cmdstan() will change, which may not be
desirable in some cases. To permanently lock the path that instantiate
uses, follow these steps:
- Set the
CMDSTANenvironment variable to the desired path toCmdStan. - Set the
CMDSTAN_INSTALLenvironment variable to"fixed". - Install
instantiate.
Henceforth, instantiate will automatically use the
CmdStan path from (1),
regardless of the value of CMDSTAN after (3). To prefer
cmdstanr::cmdstan_path() instead, you could do one of the following:
- Reinstall
instantiatewithCMDSTAN_INSTALLnot equal to"fixed", or - Set
CMDSTAN_INSTALLto"implicit"at runtime, or - Set the
cmdstan_installargument to"implicit"for the currentinstantiatepackage function you are using.
Packaging Stan models
The following section explains how to create an R package with
pre-compiled Stan models. This stage of the development workflow is
considered “runtime” for the purposes of administering
CmdStan as described
previously.
Structure
Begin with an R package with one or more Stan model files inside the
src/stan/ directory. stan_package_create() is a convenient way to
start.
stan_package_create(path = "package_folder")
#> Example package named "example" created at "package_folder". Run stan_package_configure(path = "package_folder") so that the built-in Stan model will compile when the package installs.
At minimum the package file structure should look something like this:
fs::dir_tree("package_folder")
#> package_folder
#> ├── DESCRIPTION
#> └── src
#> └── stan
#> └── bernoulli.stan
Configuration
Configure the package so the Stan models compile during installation.
stan_package_configure() writes required scripts cleanup,
cleanup.win, src/Makevars, src/Makevars.win, and
src/install.libs.R. Inside src/install.libs.R is a call to
instantiate::stan_package_compile() which you can manually edit to
control how your models are compiled. For example, different calls to
stan_package_compile() can compile different groups of models using
different C++ compiler flags.
fs::dir_tree("package_folder")
#> package_folder
#> ├── DESCRIPTION
#> ├── cleanup
#> ├── cleanup.win
#> └── src
#> ├── Makevars
#> ├── Makevars.win
#> ├── install.libs.R
#> └── stan
#> └── bernoulli.stan
Installation
Install the package just like you would any other R package. To install
it from your local copy of package_folder, open R and run:
install.packages(pkgs = "package_folder", type = "source", repos = NULL)
Models
A user can now run a model from the package without any additional
compilation. See the documentation of
CmdStanR to learn how to
use CmdStanR model objects.
library(example)
model <- stan_package_model(name = "bernoulli", package = "example")
print(model) # CmdStanR model object
#> data {
#> int<lower=0> N;
#> array[N] int<lower=0,upper=1> y;
#> }
#> parameters {
#> real<lower=0,upper=1> theta;
#> }
#> model {
#> theta ~ beta(1,1); // uniform prior on interval 0,1
#> y ~ bernoulli(theta);
#> }
fit <- model$sample(
data = list(N = 10, y = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0)),
refresh = 0,
iter_warmup = 2000,
iter_sampling = 4000
)
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.6 seconds.
fit$summary()
#> # A tibble: 2 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1 lp__ -8.15 -7.87 0.725 0.317 -9.60 -7.64 1.00 7365. 8498.
#> 2 theta 0.333 0.324 0.130 0.134 0.137 0.563 1.00 6229. 7560.
You can write an exported user-side function in your R package to access
the model. For example, you might store this code in a R/model.R file
in the package:
#' @title Fit the Bernoulli model.
#' @export
#' @family models
#' @description Fit the Bernoulli Stan model and return posterior summaries.
#' @return A data frame of posterior summaries.
#' @param y Numeric vector of Bernoulli observations (zeroes and ones).
#' @param `...` Named arguments to the `sample()` method of CmdStan model
#' objects: <https://mc-stan.org/cmdstanr/reference/model-method-sample.html>
#' @examples
#' if (instantiate::stan_cmdstan_exists()) {
#' run_bernoulli_model(y = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0))
#' }
run_bernoulli_model <- function(y, ...) {
stopifnot(is.numeric(y) && all(y >= 0 & y <= 1))
model <- stan_package_model(name = "bernoulli", package = "mypackage")
fit <- model$sample(data = list(N = length(y), y = y), ...)
fit$summary()
}
Development
- In your package
DESCRIPTIONfile, list https://mc-stan.org/r-packages/ in theAdditional_repositories:field (example inbrms). This step is only necessary whilecmdstanris not yet on CRAN.
Additional_repositories:
https://mc-stan.org/r-packag
