pybuc

pybuc (Python Bayesian Unobserved Components) is a version of R's Bayesian structural time series package, bsts, written by Steven L. Scott. The source paper can be found here or in the papers directory of this repository. While there are plans to expand the feature set of pybuc, currently there is no roadmap for the release of new features. The syntax for using pybuc closely follows statsmodels' UnobservedComponents module.

The current version of pybuc includes the following options for modeling and forecasting a structural time series:

Stochastic or non-stochastic level
Damped level
Stochastic or non-stochastic trend
Damped trend
Multiple stochastic or non-stochastic periodic-lag seasonality
Multiple damped periodic-lag seasonality
Multiple stochastic or non-stochastic "dummy" seasonality
Multiple stochastic or non-stochastic trigonometric seasonality
Regression with static coefficients<sup/>**</sup>

<sup/>**</sup> pybuc estimates regression coefficients differently than bsts. The former uses a standard Gaussian prior. The latter uses a Bernoulli-Gaussian mixture commonly known as the spike-and-slab prior. The main benefit of using a spike-and-slab prior is its promotion of coefficient-sparse solutions, i.e., variable selection, when the number of predictors in the regression component exceeds the number of observed data points.

Fast computation is achieved using Numba, a high performance just-in-time (JIT) compiler for Python.

Installation

pip install pybuc

See pyproject.toml and uv.lock for dependency details. This module depends on NumPy, Numba, Pandas, statsmodels, and Matplotlib. Python 3.10 and above is supported for versions of this package >= 0.55. All other versions support Python 3.9 and above.

Alternatively, you can set up an environment for the project via uv. Steps:

Install uv. See https://github.com/astral-sh/uv for installation instructions.
git clone https://www.github.com/devindg/pybuc.git
cd pybuc
uv sync

Motivation

The Seasonal Autoregressive Integrated Moving Average (SARIMA) model is perhaps the most widely used class of statistical time series models. By design, these models can only operate on covariance-stationary time series. Consequently, if a time series exhibits non-stationarity (e.g., trend and/or seasonality), then the data first have to be stationarized. Transforming a non-stationary series to a stationary one usually requires taking local and/or seasonal time-differences of the data, but sometimes a linear trend to detrend a trend-stationary series is sufficient. Whether to stationarize the data and to what extent differencing is needed are things that need to be determined beforehand.

Once a stationary series is in hand, a SARIMA specification must be identified. Identifying the "right" SARIMA specification can be achieved algorithmically (e.g., see the Python package pmdarima) or through examination of a series' patterns. The latter typically involves statistical tests and visual inspection of a series' autocorrelation (ACF) and partial autocorrelation (PACF) functions. Ultimately, the necessary condition for stationarity requires statistical analysis before a model can be formulated. It also implies that the underlying trend and seasonality, if they exist, are eliminated in the process of generating a stationary series. Consequently, the underlying time components that characterize a series are not of empirical interest.

Another less commonly used class of model is structural time series (STS), also known as unobserved components (UC). Whereas SARIMA models abstract away from an explicit model for trend and seasonality, STS/UC models do not. Thus, it is possible to visualize the underlying components that characterize a time series using STS/UC. Moreover, it is relatively straightforward to test for phenomena like level shifts, also known as structural breaks, by statistical examination of a time series' estimated level component.

STS/UC models also have the flexibility to accommodate multiple stochastic seasonalities. SARIMA models, in contrast, can accommodate multiple seasonalities, but only one seasonality/periodicity can be treated as stochastic. For example, daily data may have day-of-week and week-of-year seasonality. Under a SARIMA model, only one of these seasonalities can be modeled as stochastic. The other seasonality will have to be modeled as deterministic, which amounts to creating and using a set of predictors that capture said seasonality. STS/UC models, on the other hand, can accommodate both seasonalities as stochastic by treating each as distinct, unobserved state variables.

With the above in mind, what follows is a comparison between statsmodels' SARIMAX' module, statsmodels' UnobservedComponents module, and pybuc. The distinction between statsmodels.UnobservedComponents and pybuc is the former is a maximum likelihood estimator (MLE) while the latter is a Bayesian estimator. The following code demonstrates the application of these methods on a data set that exhibits trend and multiplicative seasonality. The STS/UC specification for statsmodels.UnobservedComponents and pybuc includes stochastic level, stochastic trend (trend), and stochastic trigonometric seasonality with periodicity 12 and 6 harmonics.

Usage

Example: univariate time series with level, trend, and multiplicative seasonality

A canonical data set that exhibits trend and seasonality is the airline passenger data used in Box, G.E.P.; Jenkins, G.M.; and Reinsel, G.C. Time Series Analysis, Forecasting and Control. Series G, 1976. See plot below.

plot

This data set gave rise to what is known as the "airline model", which is a SARIMA model with first-order local and seasonal differencing and first-order local and seasonal moving average representations. More compactly, SARIMA(0, 1, 1)(0, 1, 1) without drift.

To demonstrate the performance of the "airline model" on the airline passenger data, the data will be split into a training and test set. The former will include all observations up until the last twelve months of data, and the latter will include the last twelve months of data. See code below for model assessment.

Import libraries and prepare data

from pybuc import buc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.statespace.structural import UnobservedComponents


# Convenience function for computing root mean squared error
def rmse(actual, prediction):
    act, pred = actual.flatten(), prediction.flatten()
    return np.sqrt(np.mean((act - pred) ** 2))


# Import airline passenger data
url = "https://raw.githubusercontent.com/devindg/pybuc/master/examples/data/airline-passengers.csv"
air = pd.read_csv(url, header=0, index_col=0)
air = air.astype(float)
air.index = pd.to_datetime(air.index)
hold_out_size = 12

# Create train and test sets
y_train = air.iloc[:-hold_out_size]
y_test = air.iloc[-hold_out_size:]

SARIMA

''' Fit the airline data using SARIMA(0,1,1)(0,1,1) '''
sarima = SARIMAX(
    y_train,
    order=(0, 1, 1),
    seasonal_order=(0, 1, 1, 12),
    trend=[0]
)
sarima_res = sarima.fit(disp=False)
print(sarima_res.summary())

# Plot in-sample fit against actuals
plt.plot(y_train)
plt.plot(sarima_res.fittedvalues)
plt.title('SARIMA: In-sample')
plt.xticks(rotation=45, ha="right")
plt.show()

# Get and plot forecast
sarima_forecast = sarima_res.get_forecast(hold_out_size).summary_frame(alpha=0.05)
plt.plot(y_test)
plt.plot(sarima_forecast['mean'])
plt.fill_between(sarima_forecast.index,
                 sarima_forecast['mean_ci_lower'],
                 sarima_forecast['mean_ci_upper'], alpha=0.4)
plt.title('SARIMA: Forecast')
plt.legend(['Actual', 'Mean', '95% Prediction Interval'])
plt.show()

# Print RMSE
print(f"SARIMA RMSE: {rmse(y_test.to_numpy(), sarima_forecast['mean'].to_numpy())}")

SARIMA RMSE: 21.09028021383853

The SARIMA(0, 1, 1)(0, 1, 1) forecast plot.

plot

MLE Unobserved Components

''' Fit the airline data using MLE unobserved components '''
mle_uc = UnobservedComponents(
    y_train,
    exog=None,
    irregular=True,
    level=True,
    stochastic_level=True,
    trend=True,
    stochastic_trend=True,
    freq_seasonal=[{'period': 12, 'harmonics': 6}],
    stochastic_freq_seasonal=[True]
)

# Fit the model via maximum likelihood
mle_uc_res = mle_uc.fit(disp=False)
print(mle_uc_res.summary())

# Plot in-sample fit against actuals
plt.plot(y_train)
plt.plot(mle_uc_res.fittedvalues)
plt.title('MLE UC: In-sample')
plt.show()

# Plot time series components
mle_uc_res.plot_components(legend_loc='lower right', figsize=(15, 9), which='smoothed')
plt.show()

# Get and plot forecast
mle_uc_forecast = mle_uc_res.get_forecast(hold_out_size).summary_frame(alpha=0.05)
plt.plot(y_test)
plt.plot(mle_uc_forecast['mean'])
plt.fill_between(mle_uc_forecast.index,
                 mle_uc_forecast['mean_ci_lower'],
                 mle_uc_forecast['mean_ci_upper'], alpha=0.4)
plt.title('MLE UC: Forecast')
plt.legend(['Actual', 'Mean', '95% Prediction Interval'])
plt.show()

# Print RMSE
print(f"MLE UC RMSE: {rmse(y_test.to_numpy(), mle_uc_forecast['mean'].to_numpy())}")

MLE UC RMSE: 17.961873327622694

The MLE Unobserved Components forecast and component plots.

plot

As noted above, a distinguishing feature of STS/UC models is their explicit modeling of trend and seasonality. This is illustrated with the components plot.

Pybuc

Install / Use

README

pybuc

Installation

Motivation

Usage

Example: univariate time series with level, trend, and multiplicative seasonality

Import libraries and prepare data

SARIMA

MLE Unobserved Components