API Reference#
This reference provides detailed documentation for all modules, classes, and methods in the current release of PyMC-BART.
pymc_bart
#
- class pymc_bart.BART(name: str, X: Union[ndarray[Any, dtype[float64]], TensorVariable], Y: Union[ndarray[Any, dtype[float64]], TensorVariable], m: int = 50, alpha: float = 0.95, beta: float = 2.0, response: str = 'constant', split_prior: Optional[ndarray[Any, dtype[float64]]] = None, split_rules: Optional[List[SplitRule]] = None, separate_trees: Optional[bool] = False, **kwargs)#
Bayesian Additive Regression Tree distribution.
Distribution representing a sum over trees
- XTensorLike
The covariate matrix.
- YTensorLike
The response vector.
- mint
Number of trees.
- responsestr
How the leaf_node values are computed. Available options are
constant
,linear
ormix
. Defaults toconstant
. Optionslinear
andmix
are still experimental.- alphafloat
Controls the prior probability over the depth of the trees. Should be in the (0, 1) interval.
- betafloat
Controls the prior probability over the number of leaves of the trees. Should be positive.
- split_priorOptional[List[float]], default None.
List of positive numbers, one per column in input data. Defaults to None, all covariates have the same prior probability to be selected.
- split_rulesOptional[List[SplitRule]], default None
List of SplitRule objects, one per column in input data. Allows using different split rules for different columns. Default is ContinuousSplitRule. Other options are OneHotSplitRule and SubsetSplitRule, both meant for categorical variables.
- shape:Optional[Tuple], default None
Specify the output shape. If shape is different from (len(X)) (the default), train a separate tree for each value in other dimensions.
- separate_treesOptional[bool], default False
When training multiple trees (by setting a shape parameter), the default behavior is to learn a joint tree structure and only have different leaf values for each. This flag forces a fully separate tree structure to be trained instead. This is unnecessary in many cases and is considerably slower, multiplying run-time roughly by number of dimensions.
The parameters
alpha
andbeta
parametrize the probability that a node at depth \(d \: (= 0, 1, 2,...)\) is non-terminal, given by \(\alpha(1 + d)^{-\beta}\). The default values are \(\alpha = 0.95\) and \(\beta = 2\).This is the recommend prior by Chipman Et al. BART: Bayesian additive regression trees, link
- classmethod dist(*params, **kwargs)#
Creates a tensor variable corresponding to the cls distribution.
- dist_paramsarray-like
The inputs to the RandomVariable Op.
- shapeint, tuple, Variable, optional
A tuple of sizes for each dimension of the new RV.
- **kwargs
Keyword arguments that will be forwarded to the PyTensor RV Op. Most prominently:
size
ordtype
.
- rvTensorVariable
The created random variable tensor.
- logp(x, *inputs)#
Calculate log probability.
- x: numeric, TensorVariable
Value for which log-probability is calculated.
TensorVariable
- class pymc_bart.ContinuousSplitRule#
Standard continuous split rule: pick a pivot value and split depending on if variable is smaller or greater than the value picked.
- class pymc_bart.OneHotSplitRule#
Choose a single categorical value and branch on if the variable is that value or not
- class pymc_bart.PGBART(*args, **kwargs)#
Particle Gibss BART sampling step.
- vars: list
List of value variables for sampler
- num_particlestuple
Number of particles. Defaults to 10
- batchtuple
Number of trees fitted per step. The first element is the batch size during tuning and the second the batch size after tuning. Defaults to (0.1, 0.1), meaning 10% of the m trees during tuning and after tuning.
- model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).
- astep(_)#
Perform a single sample step in a raveled and concatenated parameter space.
- static competence(var, has_grad)#
PGBART is only suitable for BART distributions.
- get_particle_tree(particles: List[ParticleTree], normalized_weights: ndarray[Any, dtype[float64]]) Tuple[ParticleTree, Tree] #
Sample a new particle and associated tree
- init_particles(tree_id: int, odim: int) List[ParticleTree] #
Initialize particles.
- normalize(particles: List[ParticleTree]) float #
Use softmax to get normalized_weights.
- resample(particles: List[ParticleTree], normalized_weights: ndarray[Any, dtype[float64]]) List[ParticleTree] #
Use systematic resample for all but the first particle
Ensure particles are copied only if needed.
- stats_dtypes: list[dict[str, type]] = [{'variable_inclusion': <class 'object'>, 'tune': <class 'bool'>}]#
A list containing <=1 dictionary that maps stat names to dtypes.
This attribute is deprecated. Use stats_dtypes_shapes instead.
- systematic(normalized_weights: ndarray[Any, dtype[float64]]) ndarray[Any, dtype[int64]] #
Systematic resampling.
Return indices in the range 0, …, len(normalized_weights)
Note: adapted from nchopin/particles
- update_weight(particle: ParticleTree, odim: int) None #
Update the weight of a particle.
- class pymc_bart.SubsetSplitRule#
Choose a random subset of the categorical values and branch on belonging to that set. This is the approach taken by Sameer K. Deshpande. flexBART: Flexible Bayesian regression trees with categorical predictors. arXiv, link
- pymc_bart.plot_convergence(idata: InferenceData, var_name: Optional[str] = None, kind: str = 'ecdf', figsize: Optional[Tuple[float, float]] = None, ax=None) List[Axes] #
Plot convergence diagnostics.
- idataInferenceData
InferenceData object containing the posterior samples.
- var_nameOptional[str]
Name of the BART variable to plot. Defaults to None.
- kindstr
Type of plot to display. Options are “ecdf” (default) and “kde”.
- figsizeOptional[Tuple[float, float]], by default None.
Figure size. Defaults to None.
- axmatplotlib axes
Axes on which to plot. Defaults to None.
List[ax] : matplotlib axes
- pymc_bart.plot_ice(bartrv: Variable, X: ndarray[Any, dtype[float64]], Y: Optional[ndarray[Any, dtype[float64]]] = None, var_idx: Optional[List[int]] = None, var_discrete: Optional[List[int]] = None, func: Optional[Callable] = None, centered: Optional[bool] = True, samples: int = 100, instances: int = 30, random_seed: Optional[int] = None, sharey: bool = True, smooth: bool = True, grid: str = 'long', color='C0', color_mean: str = 'C0', alpha: float = 0.1, figsize: Optional[Tuple[float, float]] = None, smooth_kwargs: Optional[Dict[str, Any]] = None, ax: Optional[Axes] = None) List[Axes] #
Individual conditional expectation plot.
- bartrvBART Random Variable
BART variable once the model that include it has been fitted.
- Xnpt.NDArray[np.float_]
The covariate matrix.
- YOptional[npt.NDArray[np.float_]], by default None.
The response vector.
- var_idxOptional[List[int]], by default None.
List of the indices of the covariate for which to compute the pdp or ice.
- var_discreteOptional[List[int]], by default None.
List of the indices of the covariate treated as discrete.
- funcOptional[Callable], by default None.
Arbitrary function to apply to the predictions. Defaults to the identity function.
- centeredbool
If True the result is centered around the partial response evaluated at the lowest value in
xs_interval
. Defaults to True.- samplesint
Number of posterior samples used in the predictions. Defaults to 100
- instancesint
Number of instances of X to plot. Defaults to 30.
- random_seedOptional[int], by default None.
Seed used to sample from the posterior. Defaults to None.
- shareybool
Controls sharing of properties among y-axes. Defaults to True.
- smoothbool
If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.
- gridstr or tuple
How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to each other or a tuple indicating the number of rows and columns.
- colormatplotlib valid color
Color used to plot the pdp or ice. Defaults to “C0”
- color_meanmatplotlib valid color
Color used to plot the mean pdp or ice. Defaults to “C0”,
- alphafloat
Transparency level, should in the interval [0, 1].
- figsizetuple
Figure size. If None it will be defined automatically.
- smooth_kwargsdict
Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.
- axaxes
Matplotlib axes.
axes: matplotlib axes
- pymc_bart.plot_pdp(bartrv: Variable, X: ndarray[Any, dtype[float64]], Y: Optional[ndarray[Any, dtype[float64]]] = None, xs_interval: str = 'quantiles', xs_values: Optional[Union[int, List[float]]] = None, var_idx: Optional[List[int]] = None, var_discrete: Optional[List[int]] = None, func: Optional[Callable] = None, samples: int = 200, random_seed: Optional[int] = None, sharey: bool = True, smooth: bool = True, grid: str = 'long', color='C0', color_mean: str = 'C0', alpha: float = 0.1, figsize: Optional[Tuple[float, float]] = None, smooth_kwargs: Optional[Dict[str, Any]] = None, ax: Optional[Axes] = None) List[Axes] #
Partial dependence plot.
- bartrvBART Random Variable
BART variable once the model that include it has been fitted.
- Xnpt.NDArray[np.float_]
The covariate matrix.
- YOptional[npt.NDArray[np.float_]], by default None.
The response vector.
- xs_intervalstr
Method used to compute the values X used to evaluate the predicted function. “linear”, evenly spaced values in the range of X. “quantiles”, the evaluation is done at the specified quantiles of X. “insample”, the evaluation is done at the values of X. For discrete variables these options are ommited.
- xs_valuesOptional[Union[int, List[float]]], by default None.
Values of X used to evaluate the predicted function. If
xs_interval="linear"
number of points in the evenly spaced grid. Ifxs_interval="quantiles"
quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Ignored whenxs_interval="insample"
.- var_idxOptional[List[int]], by default None.
List of the indices of the covariate for which to compute the pdp or ice.
- var_discreteOptional[List[int]], by default None.
List of the indices of the covariate treated as discrete.
- funcOptional[Callable], by default None.
Arbitrary function to apply to the predictions. Defaults to the identity function.
- samplesint
Number of posterior samples used in the predictions. Defaults to 200
- random_seedOptional[int], by default None.
Seed used to sample from the posterior. Defaults to None.
- shareybool
Controls sharing of properties among y-axes. Defaults to True.
- smoothbool
If True the result will be smoothed by first computing a linear interpolation of the data over a regular grid and then applying the Savitzky-Golay filter to the interpolated data. Defaults to True.
- gridstr or tuple
How to arrange the subplots. Defaults to “long”, one subplot below the other. Other options are “wide”, one subplot next to eachother or a tuple indicating the number of rows and columns.
- colormatplotlib valid color
Color used to plot the pdp or ice. Defaults to “C0”
- color_meanmatplotlib valid color
Color used to plot the mean pdp or ice. Defaults to “C0”,
- alphafloat
Transparency level, should in the interval [0, 1].
- figsizetuple
Figure size. If None it will be defined automatically.
- smooth_kwargsdict
Additional keywords modifying the Savitzky-Golay filter. See scipy.signal.savgol_filter() for details.
- axaxes
Matplotlib axes.
axes: matplotlib axes
- pymc_bart.plot_variable_importance(idata: InferenceData, bartrv: Variable, X: ndarray[Any, dtype[float64]], labels: Optional[List[str]] = None, method: str = 'VI', figsize: Optional[Tuple[float, float]] = None, xlabel_angle: float = 0, samples: int = 100, random_seed: Optional[int] = None, ax: Optional[Axes] = None) Tuple[List[int], Union[List[Axes], Any]] #
Estimates variable importance from the BART-posterior.
- idata: InferenceData
InferenceData containing a collection of BART_trees in sample_stats group
- bartrvBART Random Variable
BART variable once the model that include it has been fitted.
- Xnpt.NDArray[np.float_]
The covariate matrix.
- labelsOptional[List[str]]
List of the names of the covariates. If X is a DataFrame the names of the covariables will be taken from it and this argument will be ignored.
- methodstr
Method used to rank variables. Available options are “VI” (default) and “backward”. The R squared will be computed following this ranking. “VI” counts how many times each variable is included in the posterior distribution of trees. “backward” uses a backward search based on the R squared. VI requieres less computation time.
- figsizetuple
Figure size. If None it will be defined automatically.
- xlabel_anglefloat
rotation angle of the x-axis labels. Defaults to 0. Use values like 45 for long labels and/or many variables.
- samplesint
Number of predictions used to compute correlation for subsets of variables. Defaults to 100
- random_seedOptional[int]
random_seed used to sample from the posterior. Defaults to None.
- axaxes
Matplotlib axes.
idxs: indexes of the covariates from higher to lower relative importance axes: matplotlib axes