SkillAgentSearch skills...

Pybats

Bayesian time series forecasting and decision analysis

Install / Use

/learn @lavinei/Pybats
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PyBATS

PyBATS is a package for Bayesian time series modeling and forecasting. It is designed to enable both quick analyses and flexible options to customize the model form, prior, and forecast period. The core of the package is the class Dynamic Generalized Linear Model (dglm). The supported DGLMs are Poisson, Bernoulli, Normal (a DLM), and Binomial. These models are primarily based on <a href='http://www2.stat.duke.edu/~mw/West&HarrisonBook/'>Bayesian Forecasting and Dynamic Models</a>.

Install

PyBATS is hosted on PyPI and can be installed with pip:

$ pip install pybats

The most recent development version is hosted on GitHub. You can download and install from there:

$ git clone git@github.com:lavinei/pybats.git pybats
$ cd pybats
$ sudo python setup.py install

Quick Start

This is the most basic example of Bayesian time series analysis using PyBATS. We'll use a public dataset of the sales of a dietary weight control product, along with the advertising spend. First we load in the data, and take a quick look at the first couples of entries:

import numpy as np
import pandas as pd

from pybats.shared import load_sales_example
from pybats.analysis import *
from pybats.point_forecast import *
from pybats.plot import *

# Load example sales and advertising data. Source: Abraham & Ledolter (1983)
data = load_sales_example()             
print(data.head())
   Sales  Advertising
1     15         12.0
2     16         20.5
3     18         21.0
4     27         15.5
5     21         15.3

The sales are integer valued counts, which we model with a Poisson Dynamic Generalized Linear Model (DGLM). Second, we extract the outcome (Y) and covariates (X) from this dataset. We'll set the forecast horizon k=1 for this example. We could look at multiple forecast horizons by setting k to a larger value. analysis, a core PyBATS function, will automatically:

  • Define the model (a Poisson DGLM)
  • Sequentially update the model coefficients with each new observation $y_t$ (also known as forward filtering)
  • Forecast $k=1$ step ahead at each desired time point

The main parameters that we need to specify are the dates during which the model will forecast. In this case we specify the start and end date with integers, because there are no actual dates associated with this dataset.

Y = data['Sales'].values
X = data['Advertising'].values.reshape(-1,1)

k = 1                                               # Forecast 1 step ahead
forecast_start = 15                                 # Start forecast at time step 15
forecast_end = 35                                   # End forecast at time step 35 (final time step)

By default, analysis will return samples from the forecast distribution as well as the model after the final observation.

mod, samples = analysis(Y, X, family="poisson",
forecast_start=forecast_start,      # First time step to forecast on
forecast_end=forecast_end,          # Final time step to forecast on
k=k,                                # Forecast horizon. If k>1, default is to forecast 1:k steps ahead, marginally
prior_length=6,                     # How many data point to use in defining prior
rho=.5,                             # Random effect extension, which increases the forecast variance (see Berry and West, 2019)
deltrend=0.95,                      # Discount factor on the trend component (the intercept)
delregn=0.95                        # Discount factor on the regression component
)
beginning forecasting

The model has the posterior mean and variance of the coefficients (also known as the state vector) stored as mod.a and mod.C respectively. We can view them in a nicer format with the method mod.get_coef.

print(mod.get_coef())
           Mean  Standard Deviation
Intercept  0.63                0.36
Regn 1     0.08                0.01

Finally, we turn our attention to the forecasts. At each time step within the forecast window, $15 \leq t \leq 35$, the model drew samples from the forecast distribution $k=1$ steps into the future. We will plot the sales, median forecast, and $95%$ credible interval using these samples.

import matplotlib.pyplot as plt

# Take the median as the point forecast
forecast = median(samples)                                  

# Plot the 1-step ahead point forecast plus the 95% credible interval
fig, ax = plt.subplots(1,1, figsize=(8, 6))   
ax = plot_data_forecast(fig, ax, Y[forecast_start:forecast_end + k], forecast, samples,
                        dates=np.arange(forecast_start, forecast_end+1, dtype='int'))
ax = ax_style(ax, ylabel='Sales', xlabel='Time', xlim=[forecast_start, forecast_end],
              legend=['Forecast', 'Sales', 'Credible Interval'])

png

Types of Models

All models in PyBATS are based on DGLMs, which are well described by their name:

  1. Dynamic: The coefficients are changing over time
  2. Generalized: We can choose the distribution of the observations (Normal, Poisson, Bernoulli, or Binomial)
  3. Linear: Forecasts are made by a standard linear combination of coefficients multiplied by predictors

The correct model type depends upon your time series, $y_t$. The most common type of observations are continuous real numbers, which can often be modeled using a normal Dynamic Linear Model (dlm). PyBATS is unique in the current Python ecosystem because it provides dynamic models for non-normally distribution observations. The base models supported by PyBATS are:

  • Normal DLMs (dlm) model continuous real numbers with normally distributed observation errors.
  • Poisson DGLMs (pois_dglm) model positive integers, as in the example above with counts of daily item sales.
  • Bernoulli DGLMs (bern_dglm) model data that can be encoded as $0-1$, or success-failure. An example is a time series of changes in stock price, where positive changes are coded as $1$, and negative changes are coded as $0$.
  • Binomial DGLMs (bin_dglm) model the sum of Bernoulli $0-1$ outcomes. An example is the daily count of responses to a survey, in which $n_t$ people are contacted each day, and $y_t$ people choose to respond.

Model Components

The linear combination in a DGLM is the multiplication (dot product) of the state vector by the regression vector

{% raw %} $$\lambda_t = F_t^{'} \theta_t$$ {% endraw %}

Where:

  • $\lambda_t$ is called the linear predictor
  • $\theta_t$ is called the state vector
  • $F_t$ is called the regression vector

The coefficients in a DGLM are all stored in the state vector, $\theta_t$. The state vector is defined by a set of different components, which are defined up front by the modeler.

Trend Component

PyBATS supports $0$, $1$, and $2$ trend coefficients in a DGLM. $1$ trend term is simply an intercept in the model. If there are $2$ trend terms, then the model contains an intercept and a local slope, which is the rate that the intercept changes over time. Because all coefficients are dynamic, both the intercept and local slope will change over time.

The default setting is to have only an intercept term, which we can see from the model defined in the example above:

mod.ntrend
1

We can access the mean $E\left[ \theta_t \right]$ and variance $V\left[ \theta_t \right]$ of the state vector by the attribute names a and R. We use the trend indices, given by mod.itrend, to view the trend component of the state vector.

mod.a[mod.itrend].round(2), mod.R.diagonal()[mod.itrend].round(2)
(array([[0.63]]), array([0.13]))

We can also access this information using get_coef, while specifying the component we want. Note that get_coef provides the coefficient standard deviations, rather than the variances.

print(mod.get_coef('trend'))
           Mean  Standard Deviation
Intercept  0.63                0.36

This means that the intercept term has a mean of $0.63$ and a standard deviation of $0.36$ at time T, the end of the analysis. The analysis is over either at time forecast_end, or when we hit the final observation in Y, whichever comes first.

To add in a local slope, we can re-run the analysis from above while specifying that ntrend=2.

mod, samples = analysis(Y, X, family="poisson",
ntrend=2,                           # Use an intercept and local slope
forecast_start=forecast_start,      # First time step to forecast on
forecast_end=forecast_end,          # Final time step to forecast on
k=k,                                # Forecast horizon. If k>1, default is to forecast 1:k steps ahead, marginally
prior_length=6,                     # How many data point to use in defining prior
rho=.5,                             # Random effect extension, increases variance of Poisson DGLM (see Berry and West, 2019)
deltrend=0.95,                      # Discount factor on the trend component (intercept)
delregn=0.95                        # Discount factor on the regression component
)
beginning forecasting
mod.ntrend
2

We can plot the forecasts with this new model, and see that the results are quite similar!

# Take the median as the point forecast
forecast = median(samples)                                  

# Plot the 1-step ahead point forecast plus the 95% credible interval
fig, ax = plt.subplots(1,1, figsize=(8, 6))   
ax = plot_data_forecast(fig, ax, Y[forecast_start:forecast_end + k], forecast, samples,
                        dates=np.arange(forecast_start, forecast_end+1, dtype='int'))
ax = ax_style(ax, ylabel='Sales', xlabel='Time', xlim=[forecast_start, forecast_end],
              legend=['Forecast', 'Sales', 'Credible Interval'])

png

Regression Component

The regression component contains all known predictors. In this example, the advertising budget is our only predictor, which

Related Skills

View on GitHub
GitHub Stars121
CategoryDevelopment
Updated3d ago
Forks25

Languages

Jupyter Notebook

Security Score

95/100

Audited on Apr 3, 2026

No findings