API Reference#

This reference provides detailed documentation for all modules, classes, and methods in the current release of PyMC experimental.

pymc_experimental.bart#

class pymc_experimental.bart.BART(name, X, Y, m=50, alpha=0.25, split_prior=None, **kwargs)[source]#

Bayesian Additive Regression Tree distribution.

Distribution representing a sum over trees

Parameters
  • X (array-like) – The covariate matrix.

  • Y (array-like) – The response vector.

  • m (int) – Number of trees

  • alpha (float) – Control the prior probability over the depth of the trees. Even when it can takes values in the interval (0, 1), it is recommended to be in the interval (0, 0.5].

  • split_prior (array-like) – Each element of split_prior should be in the [0, 1] interval and the elements should sum to 1. Otherwise they will be normalized. Defaults to None, i.e. all covariates have the same prior probability to be selected.

classmethod dist(*params, **kwargs)[source]#

Creates a tensor variable corresponding to the cls distribution.

Parameters
  • dist_params (array-like) – The inputs to the RandomVariable Op.

  • shape (int, tuple, Variable, optional) –

    A tuple of sizes for each dimension of the new RV.

    An Ellipsis (…) may be inserted in the last position to short-hand refer to all the dimensions that the RV would get if no shape/size/dims were passed at all.

  • **kwargs – Keyword arguments that will be forwarded to the Aesara RV Op. Most prominently: size or dtype.

Returns

rv – The created random variable tensor.

Return type

TensorVariable

logp(*inputs)[source]#

Calculate log probability.

Parameters

x (numeric, TensorVariable) – Value for which log-probability is calculated.

Return type

TensorVariable

class pymc_experimental.bart.PGBART(*args, **kwargs)[source]#

Particle Gibss BART sampling step.

Parameters
  • vars (list) – List of value variables for sampler

  • num_particles (int) – Number of particles for the conditional SMC sampler. Defaults to 40

  • batch (int or tuple) – Number of trees fitted per step. Defaults to “auto”, which is the 10% of the m trees during tuning and after tuning. If a tuple is passed the first element is the batch size during tuning and the second the batch size after tuning.

  • model (PyMC Model) – Optional model for sampling step. Defaults to None (taken from context).

static competence(var, has_grad)[source]#

PGBART is only suitable for BART distributions.

init_particles(tree_id: int) ndarray[source]#

Initialize particles.

normalize(particles)[source]#

Use logsumexp trick to get w_t and softmax to get normalized_weights.

w_t is the un-normalized weight per particle, we will assign it to the next round of particles, so they all start with the same weight.

update_weight(particle, old=False)[source]#

Update the weight of a particle.

Since the prior is used as the proposal,the weights are updated additively as the ratio of the new and old log-likelihoods.

pymc_experimental.bart.plot_dependence(idata, X, Y=None, kind='pdp', xs_interval='linear', xs_values=None, var_idx=None, var_discrete=None, samples=50, instances=10, random_seed=None, sharey=True, rug=True, smooth=True, indices=None, grid='long', color='C0', color_mean='C0', alpha=0.1, figsize=None, smooth_kwargs=None, ax=None)[source]#

Partial dependence or individual conditional expectation plot.

Parameters
  • idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group

  • X (array-like) – The covariate matrix.

  • Y (array-like) – The response vector.

  • kind (str) – Whether to plor a partial dependence plot (“pdp”) or an individual conditional expectation plot (“ice”). Defaults to pdp.

  • xs_interval (str) – Method used to compute the values X used to evaluate the predicted function. “linear”, evenly spaced values in the range of X. “quantiles”, the evaluation is done at the specified quantiles of X. “insample”, the evaluation is done at the values of X. For discrete variables these options are ommited.

  • xs_values (int or list) – Values of X used to evaluate the predicted function. If xs_interval="linear" number of points in the evenly spaced grid. If xs_interval="quantiles"``quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Ignored when ``xs_interval="insample".

  • var_idx (list) – List of the indices of the covariate for which to compute the pdp or ice.

  • var_discrete (list) – List of the indices of the covariate treated as discrete.

  • samples (int) – Number of posterior samples used in the predictions. Defaults to 50

  • instances (int) – Number of instances of X to plot. Only relevant if ice kind="ice" plots.

  • random_seed (int) – Seed used to sample from the posterior. Defaults to None.

  • sharey (bool) – Controls sharing of properties among y-axes. Defaults to True.

  • rug (bool) – Whether to include a rugplot. Defaults to True.

  • smooth (bool) – If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.

  • grid (str or tuple) – How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to eachother or a tuple indicating the number of rows and columns.

  • color (matplotlib valid color) – Color used to plot the pdp or ice. Defaults to “C0”

  • color_mean (matplotlib valid color) – Color used to plot the mean pdp or ice. Defaults to “C0”,

  • alpha (float) – Transparency level, should in the interval [0, 1].

  • figsize (tuple) – Figure size. If None it will be defined automatically.

  • smooth_kwargs (dict) – Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.

  • ax (axes) – Matplotlib axes.

Returns

axes

Return type

matplotlib axes

pymc_experimental.bart.plot_variable_importance(idata, X, labels=None, sort_vars=True, figsize=None, samples=100, random_seed=None)[source]#

Estimates variable importance from the BART-posterior.

Parameters
  • idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group

  • X (array-like) – The covariate matrix.

  • labels (list) – List of the names of the covariates. If X is a DataFrame the names of the covariables will be taken from it and this argument will be ignored.

  • sort_vars (bool) – Whether to sort the variables according to their variable importance. Defaults to True.

  • figsize (tuple) – Figure size. If None it will be defined automatically.

  • samples (int) – Number of predictions used to compute correlation for subsets of variables. Defaults to 100

  • random_seed (int) – random_seed used to sample from the posterior. Defaults to None.

Returns

  • idxs (indexes of the covariates from higher to lower relative importance)

  • axes (matplotlib axes)

pymc_experimental.bart.predict(idata, rng, X, size=None, excluded=None)[source]#

Generate samples from the BART-posterior.

Parameters
  • idata (InferenceData) – InferenceData containing a collection of BART_trees in sample_stats group

  • rng (NumPy random generator) –

  • X (array-like) – A covariate matrix. Use the same used to fit BART for in-sample predictions or a new one for out-of-sample predictions.

  • size (int or tuple) – Number of samples.

  • excluded (list) – indexes of the variables to exclude when computing predictions

pymc_experimental.distributions#

pymc_experimental.distributions.histogram_utils.histogram_approximation(name, dist, *, observed, **h_kwargs)[source]#

Approximate a distribution with a histogram potential.

Parameters
  • name (str) – Name for the Potential

  • dist (aesara.tensor.var.TensorVariable) – The output of pm.Distribution.dist()

  • observed (ArrayLike) – Observed value to construct a histogram. Histogram is computed over 0th axis. Dask is supported.

Returns

Potential

Return type

aesara.tensor.var.TensorVariable

Examples

Discrete variables are reduced to unique repetitions (up to min_count)

>>> import pymc as pm
>>> import pymc_experimental as pmx
>>> production = np.random.poisson([1, 2, 5], size=(1000, 3))
>>> with pm.Model(coords=dict(plant=range(3))):
...     lam = pm.Exponential("lam", 1.0, dims="plant")
...     pot = pmx.distributions.histogram_approximation(
...         "pot", pm.Poisson.dist(lam), observed=production, min_count=2
...     )

Continuous variables are discretized into n_quantiles

>>> measurements = np.random.normal([1, 2, 3], [0.1, 0.4, 0.2], size=(10000, 3))
>>> with pm.Model(coords=dict(tests=range(3))):
...     m = pm.Normal("m", dims="tests")
...     s = pm.LogNormal("s", dims="tests")
...     pot = pmx.distributions.histogram_approximation(
...         "pot", pm.Normal.dist(m, s),
...         observed=measurements, n_quantiles=50
...     )

For special cases like Zero Inflation in Continuous variables there is a flag. The flag adds a separate bin for zeros

>>> measurements = abs(measurements)
>>> measurements[100:] = 0
>>> with pm.Model(coords=dict(tests=range(3))):
...     m = pm.Normal("m", dims="tests")
...     s = pm.LogNormal("s", dims="tests")
...     pot = pmx.distributions.histogram_approximation(
...         "pot", pm.Normal.dist(m, s),
...         observed=measurements, n_quantiles=50, zero_inflation=True
...     )

pymc_experimental.utils#

pymc_experimental.utils.spline.bspline_interpolation(x, *, n=None, eval_points=None, degree=3, sparse=True)[source]#

Interpolate sparse grid to dense grid using bsplines.

Parameters
  • x (Variable) – Input Variable to interpolate. 0th coordinate assumed to be mapped regularly on [0, 1] interval

  • n (int (optional)) – Resolution of interpolation

  • eval_points (vector (optional)) – Custom eval points in [0, 1] interval (or scaled properly using min/max scaling)

  • degree (int, optional) – BSpline degree, by default 3

  • sparse (bool, optional) – Use sparse operation, by default True

Returns

The interpolated variable, interpolation is across 0th axis

Return type

Variable

Examples

>>> import pymc as pm
>>> import numpy as np
>>> half_months = np.linspace(0, 365, 12*2)
>>> with pm.Model(coords=dict(knots_time=half_months, time=np.arange(365))) as model:
...     kernel = pm.gp.cov.ExpQuad(1, ls=365/12)
...     # ready to define gp (a latent process over parameters)
...     gp = pm.gp.gp.Latent(
...         cov_func=kernel
...     )
...     y_knots = gp.prior("y_knots", half_months[:, None], dims="knots_time")
...     y = pm.Deterministic(
...         "y",
...         bspline_interpolation(y_knots, n=365, degree=3),
...         dims="time"
...     )
...     trace = pm.sample_prior_predictive(1)

Notes

Adopted from BayesAlpha where it was written by @aseyboldt