BayesianSARIMA#

class pymc_experimental.statespace.models.BayesianSARIMA(order: Tuple[int, int, int], seasonal_order: Tuple[int, int, int, int] | None = None, stationary_initialization: bool = True, filter_type: str = 'standard', state_structure: str = 'fast', measurement_error: bool = False, verbose=True)[source]#

Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors

Parameters:
  • order (tuple(int, int, int)) –

    Order of the ARIMA process. The order has the notation (p, d, q), where p is the number of autoregressive lags, q is the number of moving average components, and d is order of integration – the number of differences needed to render the data stationary.

    If d > 0, the differences are modeled as components of the hidden state, and all available data can be used. This is only possible if state_structure = ‘fast’. For interpretable states, the user must manually difference the data prior to calling the build_statespace_graph method.

  • seasonal_order (tuple(int, int, int, int), optional) –

    Seasonal order of the SARIMA process. The order has the notation (P, D, Q, S), where P is the number of seasonal lags to include, Q is the number of seasonal innovation lags to include, and D is the number of seasonal differences to perform. S is the length of the season.

    Seasonal terms are similar to ARIMA terms, in that they are merely lags of the data or innovations. It is thus possible for the seasonal lags and the ARIMA lags to overlap, for example if P <= p. In this case, an error will be raised.

  • stationary_initialization (bool, default False) –

    If true, the initial state and initial state covariance will not be assigned priors. Instead, their steady state values will be used.

    Warning

    This option is very sensitive to the priors placed on the AR and MA parameters. If the model dynamics for a given sample are not stationary, sampling will fail with a “covariance is not positive semi-definite” error.

  • filter_type (str, default "standard") – The type of Kalman Filter to use. Options are “standard”, “single”, “univariate”, “steady_state”, and “cholesky”. See the docs for kalman filters for more details.

  • state_structure (str, default "fast") –

    How to represent the state-space system. Currently, there are two choices: “fast” or “interpretable”

    • ”fast” corresponds to the state space used by [2], and is called the “Harvey” representation in statsmodels.

      This is also the default representation used by statsmodels.tsa.statespace.SARIMAX. The states combine lags and innovations at different lags to compress the dimension of the state vector to max(p, 1+q). As a result, it is very preformat, but only the first state has a clear interpretation.

    • ”interpretable” maximally expands the state vector, doing zero state compression. As a result, the state has dimension max(1, p) + max(1, q). What is gained by doing this is that every state has an obvious meaning, as either the data, an innovation, or a lag thereof.

  • measurement_error (bool, default True) – If true, a measurement error term is added to the model.

  • verbose (bool, default True) – If true, a message will be logged to the terminal explaining the variable names, dimensions, and supports.

Notes

The ARIMAX model is a univariate time series model that posits the future evolution of a stationary time series will

be a function of its past values, together with exogenous “innovations” and their past history. The model is described by its “order”, a 3-tuple (p, d, q), that are:

  • p: The number of past time steps that directly influence the present value of the time series, called the

    “autoregressive”, or AR, component

  • d: The “integration” order of the time series

  • q: The number of past exogenous innovations that directly influence the present value of the time series,

    called the “moving average”, or MA, component

Given this 3-tuple, the model can be written:

\[(1- \phi_1 B - \cdots - \phi_p B^p) (1-B)^d y_{t} = c + (1 + \theta_1 B + \cdots + \theta_q B^q) \varepsilon_t\]

Where B is the backshift operator, \(By_{t} = y_{t-1}\).

The model assumes that the data are stationary; that is, that they can be described by a time-invariant Gaussian distribution with fixed mean and finite variance. Non-stationary data, those that grow over time, are not suitable for ARIMA modeling without preprocessing. Stationary can be induced in any time series by the sequential application of differences. Given a hypothetical non-stationary process:

\[y_{t} = c + \rho y_{t-1} + \varepsilon_{t}\]

The process:

\[\Delta y_{t} = y_{t} - y_{t-1} = \rho \Delta y_{t-1} + \Delta \varepsilon_t\]

is stationary, as the non-stationary component \(c\) was eliminated by the operation of differencing. This process is said to be “integrated of order 1”, as it requires 1 difference to render stationary. This is the function of the d parameter in the ARIMA order.

Alternatively, the non-stationary components can be directly estimated. In this case, the errors of a preliminary regression are assumed to be ARIMA distributed, so that:

\[\begin{split}\begin{align} y_{t} &= X\beta + \eta_t \\ (1- \phi_1 B - \cdots - \phi_p B^p) (1-B)^d \eta_{t} &= (1 + \theta_1 B + \cdots + \theta_q B^q) \varepsilon_t \end{align}\end{split}\]

Where the design matrix X can include a constant, trends, or exogenous regressors.

ARIMA models can be represented in statespace form, as described in [1]. For more details, see chapters 3.4, 3.6, and 8.4.

Examples

The following example shows how to build an ARMA(1, 1) model – ARIMA(1, 0, 1) – using the BayesianSARIMA class:

import pymc_experimental.statespace as pmss
import pymc as pm

ss_mod = pmss.BayesianSARIMA(order=(1, 0, 1), verbose=True)

with pm.Model(coords=ss_mod.coords) as arma_model:
    state_sigmas = pm.HalfNormal("sigma_state", sigma=1.0, dims=ss_mod.param_dims["sigma_state"])

    rho = pm.Beta("ar_params", alpha=5, beta=1, dims=ss_mod.param_dims["ar_params"])
    theta = pm.Normal("ma_params", mu=0.0, sigma=0.5, dims=ss_mod.param_dims["ma_params"])

    ss_mod.build_statespace_graph(df, mode="JAX")
    idata = pm.sample(nuts_sampler='numpyro')

References

__init__(order: Tuple[int, int, int], seasonal_order: Tuple[int, int, int, int] | None = None, stationary_initialization: bool = True, filter_type: str = 'standard', state_structure: str = 'fast', measurement_error: bool = False, verbose=True)[source]#

Methods

__init__(order[, seasonal_order, ...])

add_default_priors()

Add default priors to the active PyMC model context

add_exogenous(exog)

Add an exogenous process to the statespace model

build_statespace_graph(data[, ...])

Given a parameter vector theta, constructs the full computational graph describing the state space model and the associated log probability of the data.

forecast(idata, start[, periods, end, ...])

Generate forecasts of state space model trajectories into the future.

impulse_response_function(idata[, n_steps, ...])

Generate impulse response functions (IRF) from state space model dynamics.

make_and_register_data(name, shape[, dtype])

Helper function to create a pytensor symbolic variable and register it in the _name_to_data dictionary

make_and_register_variable(name, shape[, dtype])

Helper function to create a pytensor symbolic variable and register it in the _name_to_variable dictionary

make_symbolic_graph()

The purpose of the make_symbolic_graph function is to hide tedious parameter allocations from the user.

sample_conditional_posterior(idata[, ...])

Sample from the conditional posterior; that is, given parameter draws from the posterior distribution, compute Kalman filtered trajectories.

sample_conditional_prior(idata[, random_seed])

Sample from the conditional prior; that is, given parameter draws from the prior distribution, compute Kalman filtered trajectories.

sample_unconditional_posterior(idata[, ...])

Draw unconditional sample trajectories according to state space dynamics, using random samples from the posterior distribution over model parameters.

sample_unconditional_prior(idata[, steps, ...])

Draw unconditional sample trajectories according to state space dynamics, using random samples from the prior distribution over model parameters.

unpack_statespace()

Helper function to quickly obtain all statespace matrices in the standard order.

Attributes

coords

PyMC model coordinates

data_info

Information about Data variables that need to be declared in the PyMC model block.

data_names

Names of data variables expected by the model.

default_priors

Dictionary of parameter names and callable functions to construct default priors for the model

observed_states

A k_endog length list of strings, associated with the model's observed states

param_dims

Dictionary of named dimensions for each model parameter

param_info

Information about parameters needed to declare priors

param_names

Names of model parameters

shock_names

A k_posdef length list of strings, associated with the model's shock processes

state_names

A k_states length list of strings, associated with the model's hidden states