R2D2M2CP#
- pymc_experimental.distributions.R2D2M2CP(name, output_sigma, input_sigma, *, dims, r2, variables_importance=None, variance_explained=None, importance_concentration=None, r2_std=None, positive_probs=0.5, positive_probs_std=None, centered=False)[source]#
R2D2M2CP Prior.
- Parameters:
name (str) – Name for the distribution
output_sigma (tensor) – Output standard deviation
input_sigma (tensor) – Input standard deviation
dims (Union[str, Sequence[str]]) – Dims for the distribution
r2 (tensor) – \(R^2\) estimate
variables_importance (tensor, optional) – Optional estimate for variables importance, positive, by default None
variance_explained (tensor, optional) – Alternative estimate for variables importance which is point estimate of variance explained, should sum up to one, by default None
importance_concentration (tensor, optional) – Confidence around variance explained or variable importance estimate
r2_std (tensor, optional) – Optional uncertainty over \(R^2\), by default None
positive_probs (tensor, optional) – Optional probability of variables contribution to be positive, by default 0.5
positive_probs_std (tensor, optional) – Optional uncertainty over effect direction probability, by default None
centered (bool, optional) – Centered or Non-Centered parametrization of the distribution, by default Non-Centered. Advised to check both
- Returns:
Output variance (sigma squared) is split in residual variance and explained variance.
- Return type:
residual_sigma, coefficients
- Raises:
TypeError – If parametrization is wrong.
Notes
The R2D2M2CP prior is a modification of R2D2M2 prior.
(R2D2M2)
CP is taken from https://arxiv.org/abs/2208.07132R2D2M2
(CP)
, (Correlation Probability) is proposed and implemented by Max Kochurov (@ferrine)
Examples
Here are arguments explained in a synthetic example
Warning
To use the prior in a linear regression
make sure \(X\) is centered around zero
intercept represents prior predictive mean when \(X\) is centered
setting named dims is required
import pymc_experimental as pmx import pymc as pm import numpy as np X = np.random.randn(10, 3) b = np.random.randn(3) y = X @ b + np.random.randn(10) * 0.04 + 5 with pm.Model(coords=dict(variables=["a", "b", "c"])) as model: eps, beta = pmx.distributions.R2D2M2CP( "beta", y.std(), X.std(0), dims="variables", # NOTE: global shrinkage r2=0.8, # NOTE: if you are unsure about r2 r2_std=0.2, # NOTE: if you know where a variable should go # if you do not know, leave as 0.5 positive_probs=[0.8, 0.5, 0.1], # NOTE: if you have different opinions about # where a variable should go. # NOTE: if you put 0.5 previously, # just put 0.1 there, but other # sigmas should work fine too positive_probs_std=[0.3, 0.1, 0.2], # NOTE: variable importances are relative to each other, # but larget numbers put "more" weight in the relation # use # * 1-10 for small confidence # * 10-30 for moderate confidence # * 30+ for high confidence # EXAMPLE: # "a" - is likely to be useful # "b" - no idea if it is useful # "c" - a must have in the relation variables_importance=[10, 1, 34], # NOTE: try both centered=True ) # intercept prior centering should be around prior predictive mean intercept = y.mean() # regressors should be centered around zero Xc = X - X.mean(0) obs = pm.Normal("obs", intercept + Xc @ beta, eps, observed=y)
There can be special cases by choosing specific set of arguments
Here the prior distribution of beta is
Normal(0, y.std() * r2 ** .5)
with pm.Model(coords=dict(variables=["a", "b", "c"])) as model: eps, beta = pmx.distributions.R2D2M2CP( "beta", y.std(), X.std(0), dims="variables", # NOTE: global shrinkage r2=0.8, # NOTE: if you are unsure about r2 r2_std=0.2, # NOTE: if you know where a variable should go # if you do not know, leave as 0.5 centered=False ) # intercept prior centering should be around prior predictive mean intercept = y.mean() # regressors should be centered around zero Xc = X - X.mean(0) obs = pm.Normal("obs", intercept + Xc @ beta, eps, observed=y)
It is fine to leave some of the
_std
arguments unspecified. You can also specify onlypositive_probs
, and all the variables are assumed to explain same amount of variance (same importance)with pm.Model(coords=dict(variables=["a", "b", "c"])) as model: eps, beta = pmx.distributions.R2D2M2CP( "beta", y.std(), X.std(0), dims="variables", # NOTE: global shrinkage r2=0.8, # NOTE: if you are unsure about r2 r2_std=0.2, # NOTE: if you know where a variable should go # if you do not know, leave as 0.5 positive_probs=[0.8, 0.5, 0.1], # NOTE: try both centered=True ) intercept = y.mean() obs = pm.Normal("obs", intercept + X @ beta, eps, observed=y)
Notes
To reference R2D2M2CP implementation, you can use the following bibtex entry:
@misc{pymc-experimental-r2d2m2cp, title = {pymc-devs/pymc-experimental: {P}ull {R}equest 137, {R2D2M2CP}}, url = {https://github.com/pymc-devs/pymc-experimental/pull/137}, author = {Max Kochurov}, howpublished = {GitHub}, year = {2023} }