API Reference#

This reference provides detailed documentation for all modules, classes, and methods in the current release of PyMC-BART.

`pymc_bart`#

class pymc_bart.BART(name: str, X: Union[ndarray[Any, dtype[float64]], TensorVariable], Y: Union[ndarray[Any, dtype[float64]], TensorVariable], m: int = 50, alpha: float = 0.95, beta: float = 2.0, response: str = 'constant', split_prior: Optional[ndarray[Any, dtype[float64]]] = None, split_rules: Optional[List[SplitRule]] = None, separate_trees: Optional[bool] = False, **kwargs)#

Bayesian Additive Regression Tree distribution.

Distribution representing a sum over trees

XTensorLike: The covariate matrix.
YTensorLike: The response vector.
mint: Number of trees.
responsestr: How the leaf_node values are computed. Available options are constant, linear or mix. Defaults to constant. Options linear and mix are still experimental.
alphafloat: Controls the prior probability over the depth of the trees. Should be in the (0, 1) interval.
betafloat: Controls the prior probability over the number of leaves of the trees. Should be positive.
split_priorOptional[List[float]], default None.: List of positive numbers, one per column in input data. Defaults to None, all covariates have the same prior probability to be selected.
split_rulesOptional[List[SplitRule]], default None: List of SplitRule objects, one per column in input data. Allows using different split rules for different columns. Default is ContinuousSplitRule. Other options are OneHotSplitRule and SubsetSplitRule, both meant for categorical variables.
shape:Optional[Tuple], default None: Specify the output shape. If shape is different from (len(X)) (the default), train a separate tree for each value in other dimensions.
separate_treesOptional[bool], default False: When training multiple trees (by setting a shape parameter), the default behavior is to learn a joint tree structure and only have different leaf values for each. This flag forces a fully separate tree structure to be trained instead. This is unnecessary in many cases and is considerably slower, multiplying run-time roughly by number of dimensions.

The parameters alpha and beta parametrize the probability that a node at depth \(d \: (= 0, 1, 2,...)\) is non-terminal, given by \(\alpha(1 + d)^{-\beta}\). The default values are \(\alpha = 0.95\) and \(\beta = 2\).

This is the recommend prior by Chipman Et al. BART: Bayesian additive regression trees, link

classmethod dist(*params, **kwargs)#

Creates a tensor variable corresponding to the cls distribution.

dist_paramsarray-like: The inputs to the RandomVariable Op.
shapeint, tuple, Variable, optional: A tuple of sizes for each dimension of the new RV.
**kwargs: Keyword arguments that will be forwarded to the PyTensor RV Op. Most prominently: size or dtype.

rvTensorVariable: The created random variable tensor.

logp(x, *inputs)#

Calculate log probability.

x: numeric, TensorVariable: Value for which log-probability is calculated.

TensorVariable

class pymc_bart.ContinuousSplitRule#: Standard continuous split rule: pick a pivot value and split depending on if variable is smaller or greater than the value picked.

class pymc_bart.OneHotSplitRule#: Choose a single categorical value and branch on if the variable is that value or not

class pymc_bart.PGBART(*args, **kwargs)#

Particle Gibss BART sampling step.

vars: list: List of value variables for sampler
num_particlestuple: Number of particles. Defaults to 10
batchtuple: Number of trees fitted per step. The first element is the batch size during tuning and the second the batch size after tuning. Defaults to (0.1, 0.1), meaning 10% of the m trees during tuning and after tuning.
model: PyMC Model: Optional model for sampling step. Defaults to None (taken from context).

astep(_)#: Perform a single sample step in a raveled and concatenated parameter space.

static competence(var, has_grad)#: PGBART is only suitable for BART distributions.

get_particle_tree(particles: List[ParticleTree], normalized_weights: ndarray[Any, dtype[float64]]) → Tuple[ParticleTree, Tree]#: Sample a new particle and associated tree

init_particles(tree_id: int, odim: int) → List[ParticleTree]#: Initialize particles.

normalize(particles: List[ParticleTree]) → float#: Use softmax to get normalized_weights.

resample(particles: List[ParticleTree], normalized_weights: ndarray[Any, dtype[float64]]) → List[ParticleTree]#

Use systematic resample for all but the first particle

Ensure particles are copied only if needed.

stats_dtypes: list[dict[str, type]] = [{'variable_inclusion': <class 'object'>, 'tune': <class 'bool'>}]#

A list containing <=1 dictionary that maps stat names to dtypes.

This attribute is deprecated. Use stats_dtypes_shapes instead.

systematic(normalized_weights: ndarray[Any, dtype[float64]]) → ndarray[Any, dtype[int64]]#

Systematic resampling.

Return indices in the range 0, …, len(normalized_weights)

Note: adapted from nchopin/particles

update_weight(particle: ParticleTree, odim: int) → None#: Update the weight of a particle.

class pymc_bart.SubsetSplitRule#: Choose a random subset of the categorical values and branch on belonging to that set. This is the approach taken by Sameer K. Deshpande. flexBART: Flexible Bayesian regression trees with categorical predictors. arXiv, link

pymc_bart.plot_convergence(idata: InferenceData, var_name: Optional[str] = None, kind: str = 'ecdf', figsize: Optional[Tuple[float, float]] = None, ax=None) → List[Axes]#

Plot convergence diagnostics.

idataInferenceData: InferenceData object containing the posterior samples.
var_nameOptional[str]: Name of the BART variable to plot. Defaults to None.
kindstr: Type of plot to display. Options are “ecdf” (default) and “kde”.
figsizeOptional[Tuple[float, float]], by default None.: Figure size. Defaults to None.
axmatplotlib axes: Axes on which to plot. Defaults to None.

List[ax] : matplotlib axes

pymc_bart.plot_ice(bartrv: Variable, X: ndarray[Any, dtype[float64]], Y: Optional[ndarray[Any, dtype[float64]]] = None, var_idx: Optional[List[int]] = None, var_discrete: Optional[List[int]] = None, func: Optional[Callable] = None, centered: Optional[bool] = True, samples: int = 100, instances: int = 30, random_seed: Optional[int] = None, sharey: bool = True, smooth: bool = True, grid: str = 'long', color='C0', color_mean: str = 'C0', alpha: float = 0.1, figsize: Optional[Tuple[float, float]] = None, smooth_kwargs: Optional[Dict[str, Any]] = None, ax: Optional[Axes] = None) → List[Axes]#

Individual conditional expectation plot.

bartrvBART Random Variable: BART variable once the model that include it has been fitted.
Xnpt.NDArray[np.float_]: The covariate matrix.
YOptional[npt.NDArray[np.float_]], by default None.: The response vector.
var_idxOptional[List[int]], by default None.: List of the indices of the covariate for which to compute the pdp or ice.
var_discreteOptional[List[int]], by default None.: List of the indices of the covariate treated as discrete.
funcOptional[Callable], by default None.: Arbitrary function to apply to the predictions. Defaults to the identity function.
centeredbool: If True the result is centered around the partial response evaluated at the lowest value in xs_interval. Defaults to True.
samplesint: Number of posterior samples used in the predictions. Defaults to 100
instancesint: Number of instances of X to plot. Defaults to 30.
random_seedOptional[int], by default None.: Seed used to sample from the posterior. Defaults to None.
shareybool: Controls sharing of properties among y-axes. Defaults to True.
smoothbool: If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.
gridstr or tuple: How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to each other or a tuple indicating the number of rows and columns.
colormatplotlib valid color: Color used to plot the pdp or ice. Defaults to “C0”
color_meanmatplotlib valid color: Color used to plot the mean pdp or ice. Defaults to “C0”,
alphafloat: Transparency level, should in the interval [0, 1].
figsizetuple: Figure size. If None it will be defined automatically.
smooth_kwargsdict: Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.
axaxes: Matplotlib axes.

axes: matplotlib axes

pymc_bart.plot_pdp(bartrv: Variable, X: ndarray[Any, dtype[float64]], Y: Optional[ndarray[Any, dtype[float64]]] = None, xs_interval: str = 'quantiles', xs_values: Optional[Union[int, List[float]]] = None, var_idx: Optional[List[int]] = None, var_discrete: Optional[List[int]] = None, func: Optional[Callable] = None, samples: int = 200, random_seed: Optional[int] = None, sharey: bool = True, smooth: bool = True, grid: str = 'long', color='C0', color_mean: str = 'C0', alpha: float = 0.1, figsize: Optional[Tuple[float, float]] = None, smooth_kwargs: Optional[Dict[str, Any]] = None, ax: Optional[Axes] = None) → List[Axes]#

Partial dependence plot.

bartrvBART Random Variable: BART variable once the model that include it has been fitted.
Xnpt.NDArray[np.float_]: The covariate matrix.
YOptional[npt.NDArray[np.float_]], by default None.: The response vector.
xs_intervalstr: Method used to compute the values X used to evaluate the predicted function. “linear”, evenly spaced values in the range of X. “quantiles”, the evaluation is done at the specified quantiles of X. “insample”, the evaluation is done at the values of X. For discrete variables these options are ommited.
xs_valuesOptional[Union[int, List[float]]], by default None.: Values of X used to evaluate the predicted function. If xs_interval="linear" number of points in the evenly spaced grid. If xs_interval="quantiles" quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Ignored when xs_interval="insample".
var_idxOptional[List[int]], by default None.: List of the indices of the covariate for which to compute the pdp or ice.
var_discreteOptional[List[int]], by default None.: List of the indices of the covariate treated as discrete.
funcOptional[Callable], by default None.: Arbitrary function to apply to the predictions. Defaults to the identity function.
samplesint: Number of posterior samples used in the predictions. Defaults to 200
random_seedOptional[int], by default None.: Seed used to sample from the posterior. Defaults to None.
shareybool: Controls sharing of properties among y-axes. Defaults to True.
smoothbool: If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.
gridstr or tuple: How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to eachother or a tuple indicating the number of rows and columns.
colormatplotlib valid color: Color used to plot the pdp or ice. Defaults to “C0”
color_meanmatplotlib valid color: Color used to plot the mean pdp or ice. Defaults to “C0”,
alphafloat: Transparency level, should in the interval [0, 1].
figsizetuple: Figure size. If None it will be defined automatically.
smooth_kwargsdict: Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.
axaxes: Matplotlib axes.

axes: matplotlib axes

pymc_bart.plot_variable_importance(idata: InferenceData, bartrv: Variable, X: ndarray[Any, dtype[float64]], labels: Optional[List[str]] = None, method: str = 'VI', figsize: Optional[Tuple[float, float]] = None, xlabel_angle: float = 0, samples: int = 100, random_seed: Optional[int] = None, ax: Optional[Axes] = None) → Tuple[List[int], Union[List[Axes], Any]]#

Estimates variable importance from the BART-posterior.

idata: InferenceData: InferenceData containing a collection of BART_trees in sample_stats group
bartrvBART Random Variable: BART variable once the model that include it has been fitted.
Xnpt.NDArray[np.float_]: The covariate matrix.
labelsOptional[List[str]]: List of the names of the covariates. If X is a DataFrame the names of the covariables will be taken from it and this argument will be ignored.
methodstr: Method used to rank variables. Available options are “VI” (default) and “backward”. The R squared will be computed following this ranking. “VI” counts how many times each variable is included in the posterior distribution of trees. “backward” uses a backward search based on the R squared. VI requieres less computation time.
figsizetuple: Figure size. If None it will be defined automatically.
xlabel_anglefloat: rotation angle of the x-axis labels. Defaults to 0. Use values like 45 for long labels and/or many variables.
samplesint: Number of predictions used to compute correlation for subsets of variables. Defaults to 100
random_seedOptional[int]: random_seed used to sample from the posterior. Defaults to None.
axaxes: Matplotlib axes.

idxs: indexes of the covariates from higher to lower relative importance axes: matplotlib axes

API Reference#

pymc_bart#

`pymc_bart`#