Mvgam
{mvgam} R 📦 to fit Dynamic Bayesian Generalized Additive Models for multivariate modeling and forecasting
Install / Use
/learn @nicholasjclark/MvgamREADME
<img src="man/figures/mvgam_logo.png" width = 120 alt="mvgam R package logo"/><img src="https://raw.githubusercontent.com/stan-dev/logos/master/logo_tm.png" align="right" width=120 alt="Stan Logo"/>
mvgam
MultiVariate (Dynamic) Generalized Additive Models
The mvgam 📦 fits Bayesian Dynamic Generalized Additive Models (DGAMs)
that can include highly flexible nonlinear predictor effects, latent
variables and multivariate time series models. The package does this by
relying on functionalities from the impressive
<a href="https://paulbuerkner.com/brms/"
target="_blank"><code>brms</code></a> and
<a href="https://cran.r-project.org/package=mgcv"
target="_blank"><code>mgcv</code></a> packages. Parameters are estimated
using the probabilistic programming language
Stan, giving users access to the most advanced
Bayesian inference algorithms available. This allows mvgam to fit a
very wide range of models, including:
- <a href="https://nicholasjclark.github.io/mvgam/articles/trend_formulas.html" target="_blank">Multivariate State-Space Time Series Models</a>
- <a href="https://nicholasjclark.github.io/mvgam/reference/RW.html#ref-examples" target="_blank">Continuous-Time Autoregressive Time Series Models</a>
- <a href="https://nicholasjclark.github.io/mvgam/articles/shared_states.html" target="_blank">Shared Signal Time Series Models</a>
- <a href="https://nicholasjclark.github.io/mvgam/reference/lv_correlations.html" target="_blank">Dynamic Factor Models</a>
- <a href="https://nicholasjclark.github.io/mvgam/articles/nmixtures.html" target="_blank">Hierarchical N-mixture Models</a>
- <a href="https://www.youtube.com/watch?v=2POK_FVwCHk" target="_blank">Hierarchical Generalized Additive Models</a>
- <a href="https://nicholasjclark.github.io/mvgam/reference/jsdgam.html" target="_blank">Joint Species Distribution Models</a>
Installation
You can install the stable package version from CRAN using:
install.packages('mvgam'), or install the latest development version
using: devtools::install_github("nicholasjclark/mvgam"). You will also
need a working version of Stan installed (along with either rstan
and/or cmdstanr). Please refer to installation links for Stan with
rstan <a href="https://mc-stan.org/users/interfaces/rstan"
target="_blank">here</a>, or for Stan with cmdstandr
<a href="https://mc-stan.org/cmdstanr/" target="_blank">here</a>.
Cheatsheet
A simple example
We can explore the package’s primary functions using one of it’s
built-in datasets. Use plot_mvgam_series() to inspect features for
time series from
<a href="https://portal.weecology.org/" target="_blank">the Portal
Project</a>, which represent counts of baited captures for four desert
rodent species over time (see ?portal_data for more details about the
dataset).
data(portal_data)
plot_mvgam_series(
data = portal_data,
y = 'captures',
series = 'all'
)
<img src="man/figures/README-unnamed-chunk-4-1.png" alt="Visualizing multivariate time series in R using mvgam" width="100%" />
plot_mvgam_series(
data = portal_data,
y = 'captures',
series = 1
)
<img src="man/figures/README-unnamed-chunk-4-2.png" alt="Visualizing multivariate time series in R using mvgam" width="100%" />
plot_mvgam_series(
data = portal_data,
y = 'captures',
series = 4
)
<img src="man/figures/README-unnamed-chunk-4-3.png" alt="Visualizing multivariate time series in R using mvgam" width="100%" />
These plots show that the time series are count responses, with missing
data, many zeroes, seasonality and temporal autocorrelation all present.
These features make time series analysis and forecasting very difficult
using conventional software. But mvgam shines in these tasks.
For most forecasting exercises, we’ll want to split the data into training and testing folds:
data_train <- portal_data %>%
dplyr::filter(time <= 60)
data_test <- portal_data %>%
dplyr::filter(time > 60 &
time <= 65)
Formulate an mvgam model; this model fits a State-Space GAM in which
each species has its own intercept, linear association with ndvi_ma12
and potentially nonlinear association with mintemp. These effects are
estimated jointly with a full time series model for the temporal
dynamics (in this case a Vector Autoregressive process). We assume the
outcome follows a Poisson distribution and will condition the model in
Stan using MCMC sampling with Cmdstan:
mod <- mvgam(
# Observation model is empty as we don't have any
# covariates that impact observation error
formula = captures ~ 0,
# Process model contains varying intercepts,
# varying slopes of ndvi_ma12 and varying smooths
# of mintemp for each series.
# Temporal dynamics are modelled with a Vector
# Autoregression (VAR(1))
trend_formula = ~
trend +
s(trend, bs = 're', by = ndvi_ma12) +
s(mintemp, bs = 'bs', by = trend) - 1,
trend_model = VAR(cor = TRUE),
# Obvservations are conditionally Poisson
family = poisson(),
# Condition on the training data
data = data_train,
backend = 'cmdstanr'
)
Using print() returns a quick summary of the object:
mod
#> GAM observation formula:
#> captures ~ 1
#>
#> GAM process formula:
#> ~trend + s(trend, bs = "re", by = ndvi_ma12) + s(mintemp, bs = "bs",
#> by = trend) - 1
#>
#> Family:
#> poisson
#>
#> Link function:
#> log
#>
#> Trend model:
#> VAR(cor = TRUE)
#>
#>
#> N latent factors:
#> 4
#>
#> N series:
#> 4
#>
#> N timepoints:
#> 60
#>
#> Status:
#> Fitted using Stan
#> 4 chains, each with iter = 2000; warmup = 1500; thin = 1
#> Total post-warmup draws = 2000
Split Rhat and Effective Sample Size diagnostics show good convergence of model estimates
mcmc_plot(mod,
type = 'rhat_hist')
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
<img src="man/figures/README-unnamed-chunk-9-1.png" alt="Rhats of parameters estimated with Stan in mvgam" width="100%" />
mcmc_plot(mod,
type = 'neff_hist')
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
<img src="man/figures/README-unnamed-chunk-10-1.png" alt="Effective sample sizes of parameters estimated with Stan in mvgam" width="100%" />
Use conditional_effects() for a quick visualisation of the main terms
in model formulae
conditional_effects(mod,
type = 'link')
<img src="man/figures/README-unnamed-chunk-11-1.png" alt="Plotting GAM effects in mvgam and R" width="100%" /><img src="man/figures/README-unnamed-chunk-11-2.png" alt="Plotting GAM effects in mvgam and R" width="100%" /><img src="man/figures/README-unnamed-chunk-11-3.png" alt="Plotting GAM effects in mvgam and R" width="100%" />
If you have the gratia package installed, it can also be used to plot
partial effects of smooths
require(gratia)
draw(mod,
trend_effects = TRUE)
<img src="man/figures/README-unnamed-chunk-12-1.png" alt="Plotting GAM smooth functions in mvgam using gratia" width="100%" />
Or design more targeted plots using plot_predictions() from the
marginaleffects package
plot_predictions(
mod,
condition = c('ndvi_ma12',
'series',
'series'),
type = 'link'
)
<img src="man/figures/README-unnamed-chunk-13-1.png" alt="Using marginaleffects and mvgam to plot GAM smooth functions in R" width="100%" />
plot_predictions(
mod,
condition = c('mintemp',
'series',
'series'),
type = 'link'
)
<img src="man/figures/README-unnamed-chunk-14-1.png" alt="Using marginaleffects and mvgam to plot GAM smooth functions in R" width="100%" />
We can also view the model’s posterior predictions for the entire series
(testing and training). Forecasts can be scored using a range of proper
scoring rules. See ?score.mvgam_forecast for more details
fcs <- forecast(mod,
newdata = data_test)
plot(fcs, series = 1) +
plot(fcs, series = 2) +
plot(fcs, series = 3) +
plot(fcs, series = 4)
#> Out of sample DRPS:
#> 8.451467
#> Out of sample DRPS:
#> 5.168817
#> Out of sample DRPS:
#> 8.52922325
#> Out of sample DRPS:
#> 3.60317975
<img src="man/figures/README-unnamed-chunk-15-1.png" alt="Plotting forecast distributions using mvgam in R" width="100%" />
For Vector Autoregressions fit in mvgam, we can inspect <a
href="https

