API Reference#

This reference provides detailed documentation for all modules, classes, and methods in the current release of PyMC-BART.

pymc_bart#

class pymc_bart.BART(name: str, X: Union[ndarray[Any, dtype[float64]], TensorVariable], Y: Union[ndarray[Any, dtype[float64]], TensorVariable], m: int = 50, alpha: float = 0.95, beta: float = 2.0, response: str = 'constant', split_prior: Optional[ndarray[Any, dtype[float64]]] = None, split_rules: Optional[List[SplitRule]] = None, separate_trees: Optional[bool] = False, **kwargs)#

Bayesian Additive Regression Tree distribution.

Distribution representing a sum over trees

XTensorLike

The covariate matrix.

YTensorLike

The response vector.

mint

Number of trees.

responsestr

How the leaf_node values are computed. Available options are constant, linear or mix. Defaults to constant. Options linear and mix are still experimental.

alphafloat

Controls the prior probability over the depth of the trees. Should be in the (0, 1) interval.

betafloat

Controls the prior probability over the number of leaves of the trees. Should be positive.

split_priorOptional[List[float]], default None.

List of positive numbers, one per column in input data. Defaults to None, all covariates have the same prior probability to be selected.

split_rulesOptional[List[SplitRule]], default None

List of SplitRule objects, one per column in input data. Allows using different split rules for different columns. Default is ContinuousSplitRule. Other options are OneHotSplitRule and SubsetSplitRule, both meant for categorical variables.

shape:Optional[Tuple], default None

Specify the output shape. If shape is different from (len(X)) (the default), train a separate tree for each value in other dimensions.

separate_treesOptional[bool], default False

When training multiple trees (by setting a shape parameter), the default behavior is to learn a joint tree structure and only have different leaf values for each. This flag forces a fully separate tree structure to be trained instead. This is unnecessary in many cases and is considerably slower, multiplying run-time roughly by number of dimensions.

The parameters alpha and beta parametrize the probability that a node at depth \(d \: (= 0, 1, 2,...)\) is non-terminal, given by \(\alpha(1 + d)^{-\beta}\). The default values are \(\alpha = 0.95\) and \(\beta = 2\).

This is the recommend prior by Chipman Et al. BART: Bayesian additive regression trees, link

classmethod dist(*params, **kwargs)#

Creates a tensor variable corresponding to the cls distribution.

dist_paramsarray-like

The inputs to the RandomVariable Op.

shapeint, tuple, Variable, optional

A tuple of sizes for each dimension of the new RV.

**kwargs

Keyword arguments that will be forwarded to the PyTensor RV Op. Most prominently: size or dtype.

rvTensorVariable

The created random variable tensor.

logp(x, *inputs)#

Calculate log probability.

x: numeric, TensorVariable

Value for which log-probability is calculated.

TensorVariable

class pymc_bart.ContinuousSplitRule#

Standard continuous split rule: pick a pivot value and split depending on if variable is smaller or greater than the value picked.

class pymc_bart.OneHotSplitRule#

Choose a single categorical value and branch on if the variable is that value or not

class pymc_bart.PGBART(*args, **kwargs)#

Particle Gibss BART sampling step.

vars: list

List of value variables for sampler

num_particlestuple

Number of particles. Defaults to 10

batchtuple

Number of trees fitted per step. The first element is the batch size during tuning and the second the batch size after tuning. Defaults to (0.1, 0.1), meaning 10% of the m trees during tuning and after tuning.

model: PyMC Model

Optional model for sampling step. Defaults to None (taken from context).

astep(_)#

Perform a single sample step in a raveled and concatenated parameter space.

static competence(var, has_grad)#

PGBART is only suitable for BART distributions.

get_particle_tree(particles: List[ParticleTree], normalized_weights: ndarray[Any, dtype[float64]]) Tuple[ParticleTree, Tree]#

Sample a new particle and associated tree

init_particles(tree_id: int, odim: int) List[ParticleTree]#

Initialize particles.

normalize(particles: List[ParticleTree]) float#

Use softmax to get normalized_weights.

resample(particles: List[ParticleTree], normalized_weights: ndarray[Any, dtype[float64]]) List[ParticleTree]#

Use systematic resample for all but the first particle

Ensure particles are copied only if needed.

stats_dtypes: list[dict[str, type]] = [{'variable_inclusion': <class 'object'>, 'tune': <class 'bool'>}]#

A list containing <=1 dictionary that maps stat names to dtypes.

This attribute is deprecated. Use stats_dtypes_shapes instead.

systematic(normalized_weights: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[int64]]#

Systematic resampling.

Return indices in the range 0, …, len(normalized_weights)

Note: adapted from nchopin/particles

update_weight(particle: ParticleTree, odim: int) None#

Update the weight of a particle.

class pymc_bart.SubsetSplitRule#

Choose a random subset of the categorical values and branch on belonging to that set. This is the approach taken by Sameer K. Deshpande. flexBART: Flexible Bayesian regression trees with categorical predictors. arXiv, link

pymc_bart.plot_convergence(idata: InferenceData, var_name: Optional[str] = None, kind: str = 'ecdf', figsize: Optional[Tuple[float, float]] = None, ax=None) List[Axes]#

Plot convergence diagnostics.

idataInferenceData

InferenceData object containing the posterior samples.

var_nameOptional[str]

Name of the BART variable to plot. Defaults to None.

kindstr

Type of plot to display. Options are “ecdf” (default) and “kde”.

figsizeOptional[Tuple[float, float]], by default None.

Figure size. Defaults to None.

axmatplotlib axes

Axes on which to plot. Defaults to None.

List[ax] : matplotlib axes

pymc_bart.plot_ice(bartrv: Variable, X: ndarray[Any, dtype[float64]], Y: Optional[ndarray[Any, dtype[float64]]] = None, var_idx: Optional[List[int]] = None, var_discrete: Optional[List[int]] = None, func: Optional[Callable] = None, centered: Optional[bool] = True, samples: int = 100, instances: int = 30, random_seed: Optional[int] = None, sharey: bool = True, smooth: bool = True, grid: str = 'long', color='C0', color_mean: str = 'C0', alpha: float = 0.1, figsize: Optional[Tuple[float, float]] = None, smooth_kwargs: Optional[Dict[str, Any]] = None, ax: Optional[Axes] = None) List[Axes]#

Individual conditional expectation plot.

bartrvBART Random Variable

BART variable once the model that include it has been fitted.

Xnpt.NDArray[np.float_]

The covariate matrix.

YOptional[npt.NDArray[np.float_]], by default None.

The response vector.

var_idxOptional[List[int]], by default None.

List of the indices of the covariate for which to compute the pdp or ice.

var_discreteOptional[List[int]], by default None.

List of the indices of the covariate treated as discrete.

funcOptional[Callable], by default None.

Arbitrary function to apply to the predictions. Defaults to the identity function.

centeredbool

If True the result is centered around the partial response evaluated at the lowest value in xs_interval. Defaults to True.

samplesint

Number of posterior samples used in the predictions. Defaults to 100

instancesint

Number of instances of X to plot. Defaults to 30.

random_seedOptional[int], by default None.

Seed used to sample from the posterior. Defaults to None.

shareybool

Controls sharing of properties among y-axes. Defaults to True.

smoothbool

If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.

gridstr or tuple

How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to each other or a tuple indicating the number of rows and columns.

colormatplotlib valid color

Color used to plot the pdp or ice. Defaults to “C0”

color_meanmatplotlib valid color

Color used to plot the mean pdp or ice. Defaults to “C0”,

alphafloat

Transparency level, should in the interval [0, 1].

figsizetuple

Figure size. If None it will be defined automatically.

smooth_kwargsdict

Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.

axaxes

Matplotlib axes.

axes: matplotlib axes

pymc_bart.plot_pdp(bartrv: Variable, X: ndarray[Any, dtype[float64]], Y: Optional[ndarray[Any, dtype[float64]]] = None, xs_interval: str = 'quantiles', xs_values: Optional[Union[int, List[float]]] = None, var_idx: Optional[List[int]] = None, var_discrete: Optional[List[int]] = None, func: Optional[Callable] = None, samples: int = 200, random_seed: Optional[int] = None, sharey: bool = True, smooth: bool = True, grid: str = 'long', color='C0', color_mean: str = 'C0', alpha: float = 0.1, figsize: Optional[Tuple[float, float]] = None, smooth_kwargs: Optional[Dict[str, Any]] = None, ax: Optional[Axes] = None) List[Axes]#

Partial dependence plot.

bartrvBART Random Variable

BART variable once the model that include it has been fitted.

Xnpt.NDArray[np.float_]

The covariate matrix.

YOptional[npt.NDArray[np.float_]], by default None.

The response vector.

xs_intervalstr

Method used to compute the values X used to evaluate the predicted function. “linear”, evenly spaced values in the range of X. “quantiles”, the evaluation is done at the specified quantiles of X. “insample”, the evaluation is done at the values of X. For discrete variables these options are ommited.

xs_valuesOptional[Union[int, List[float]]], by default None.

Values of X used to evaluate the predicted function. If xs_interval="linear" number of points in the evenly spaced grid. If xs_interval="quantiles" quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Ignored when xs_interval="insample".

var_idxOptional[List[int]], by default None.

List of the indices of the covariate for which to compute the pdp or ice.

var_discreteOptional[List[int]], by default None.

List of the indices of the covariate treated as discrete.

funcOptional[Callable], by default None.

Arbitrary function to apply to the predictions. Defaults to the identity function.

samplesint

Number of posterior samples used in the predictions. Defaults to 200

random_seedOptional[int], by default None.

Seed used to sample from the posterior. Defaults to None.

shareybool

Controls sharing of properties among y-axes. Defaults to True.

smoothbool

If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.

gridstr or tuple

How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to eachother or a tuple indicating the number of rows and columns.

colormatplotlib valid color

Color used to plot the pdp or ice. Defaults to “C0”

color_meanmatplotlib valid color

Color used to plot the mean pdp or ice. Defaults to “C0”,

alphafloat

Transparency level, should in the interval [0, 1].

figsizetuple

Figure size. If None it will be defined automatically.

smooth_kwargsdict

Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.

axaxes

Matplotlib axes.

axes: matplotlib axes

pymc_bart.plot_variable_importance(idata: InferenceData, bartrv: Variable, X: ndarray[Any, dtype[float64]], labels: Optional[List[str]] = None, method: str = 'VI', figsize: Optional[Tuple[float, float]] = None, xlabel_angle: float = 0, samples: int = 100, random_seed: Optional[int] = None, ax: Optional[Axes] = None) Tuple[List[int], Union[List[Axes], Any]]#

Estimates variable importance from the BART-posterior.

idata: InferenceData

InferenceData containing a collection of BART_trees in sample_stats group

bartrvBART Random Variable

BART variable once the model that include it has been fitted.

Xnpt.NDArray[np.float_]

The covariate matrix.

labelsOptional[List[str]]

List of the names of the covariates. If X is a DataFrame the names of the covariables will be taken from it and this argument will be ignored.

methodstr

Method used to rank variables. Available options are “VI” (default) and “backward”. The R squared will be computed following this ranking. “VI” counts how many times each variable is included in the posterior distribution of trees. “backward” uses a backward search based on the R squared. VI requieres less computation time.

figsizetuple

Figure size. If None it will be defined automatically.

xlabel_anglefloat

rotation angle of the x-axis labels. Defaults to 0. Use values like 45 for long labels and/or many variables.

samplesint

Number of predictions used to compute correlation for subsets of variables. Defaults to 100

random_seedOptional[int]

random_seed used to sample from the posterior. Defaults to None.

axaxes

Matplotlib axes.

idxs: indexes of the covariates from higher to lower relative importance axes: matplotlib axes