Prior and Posterior Predictive Checks#
Posterior predictive checks (PPCs) are a great way to validate a model. The idea is to generate data from the model using parameters from draws from the posterior.
Elaborating slightly, one can say that PPCs analyze the degree to which data generated from the model deviate from data generated from the true distribution. So, often you will want to know if, for example, your posterior distribution is approximating your underlying distribution. The visualization aspect of this model evaluation method is also great for a ‘sense check’ or explaining your model to others and getting criticism.
Prior predictive checks are also a crucial part of the Bayesian modeling workflow. Basically, they have two main benefits:
They allow you to check whether you are indeed incorporating scientific knowledge into your model – in short, they help you check how credible your assumptions before seeing the data are.
They can help sampling considerably, especially for generalized linear models, where the outcome space and the parameter space diverge because of the link function.
Here, we will implement a general routine to draw samples from the observed nodes of a model. The models are basic but they will be a steppingstone for creating your own routines. If you want to see how to do prior and posterior predictive checks in a more complex, multidimensional model, you can check this notebook. Now, let’s sample!
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from scipy.special import expit as logistic
import pymc as pm
print(f"Running on PyMC v{pm.__version__}")
Running on PyMC v5.15.1+68.gc0b060b98.dirty
az.style.use("arviz-darkgrid")
RANDOM_SEED = 58
rng = np.random.default_rng(RANDOM_SEED)
def standardize(series):
"""Standardize a pandas series"""
return (series - series.mean()) / series.std()
Lets generate a very simple linear regression model. On purpose, I’ll simulate data that don’t come from a standard Normal (you’ll see why later):
N = 100
true_a, true_b, predictor = 0.5, 3.0, rng.normal(loc=2, scale=6, size=N)
true_mu = true_a + true_b * predictor
true_sd = 2.0
outcome = rng.normal(loc=true_mu, scale=true_sd, size=N)
f"{predictor.mean():.2f}, {predictor.std():.2f}, {outcome.mean():.2f}, {outcome.std():.2f}"
'1.59, 5.69, 4.97, 17.54'
As you can see, variation in our predictor and outcome are quite high – which is often the case with real data. And sometimes, the sampler won’t like this – and you don’t want to make the sampler angry when you’re a Bayesian… So, let’s do what you’ll often have to do with real data: standardize! This way, our predictor and outcome will have a mean of 0 and std of 1, and the sampler will be much, much happier:
predictor_scaled = standardize(predictor)
outcome_scaled = standardize(outcome)
f"{predictor_scaled.mean():.2f}, {predictor_scaled.std():.2f}, {outcome_scaled.mean():.2f}, {outcome_scaled.std():.2f}"
'0.00, 1.00, -0.00, 1.00'
And now, let’s write the model with conventional flat priors and sample prior predictive samples:
with pm.Model() as model_1:
a = pm.Normal("a", 0.0, 10.0)
b = pm.Normal("b", 0.0, 10.0)
mu = a + b * predictor_scaled
sigma = pm.Exponential("sigma", 1.0)
pm.Normal("obs", mu=mu, sigma=sigma, observed=outcome_scaled)
idata = pm.sample_prior_predictive(draws=50, random_seed=rng)
Sampling: [a, b, obs, sigma]
What do these priors mean? It’s always hard to tell on paper – the best is to plot their implication on the outcome scale, like that:
_, ax = plt.subplots()
x = xr.DataArray(np.linspace(-2, 2, 50), dims=["plot_dim"])
prior = idata.prior
y = prior["a"] + prior["b"] * x
ax.plot(x, y.stack(sample=("chain", "draw")), c="k", alpha=0.4)
ax.set_xlabel("Predictor (stdz)")
ax.set_ylabel("Mean Outcome (stdz)")
ax.set_title("Prior predictive checks -- Flat priors");
These priors allow for absurdly strong relationships between the outcome and predictor. Of course, the choice of prior always depends on your model and data, but look at the scale of the y axis: the outcome can go from -40 to +40 standard deviations (remember, the data are standardized). I hope you will agree this is way too permissive – we can do better! Let’s use weakly informative priors and see what they yield. In a real case study, this is the part where you incorporate scientific knowledge into your model:
with pm.Model() as model_1:
a = pm.Normal("a", 0.0, 0.5)
b = pm.Normal("b", 0.0, 1.0)
mu = a + b * predictor_scaled
sigma = pm.Exponential("sigma", 1.0)
pm.Normal("obs", mu=mu, sigma=sigma, observed=outcome_scaled)
idata = pm.sample_prior_predictive(draws=50, random_seed=rng)
Sampling: [a, b, obs, sigma]
_, ax = plt.subplots()
x = xr.DataArray(np.linspace(-2, 2, 50), dims=["plot_dim"])
prior = idata.prior
y = prior["a"] + prior["b"] * x
ax.plot(x, y.stack(sample=("chain", "draw")), c="k", alpha=0.4)
ax.set_xlabel("Predictor (stdz)")
ax.set_ylabel("Mean Outcome (stdz)")
ax.set_title("Prior predictive checks -- Weakly regularizing priors");
Well that’s way better! There are still very strong relationships, but at least now the outcome stays in the realm of possibilities. Now, it’s time to party – if by “party” you mean “run the model”, of course.
with model_1:
idata.extend(pm.sample(1000, tune=2000, random_seed=rng))
az.plot_trace(idata);
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a, b, sigma]
Sampling 4 chains for 2_000 tune and 1_000 draw iterations (8_000 + 4_000 draws total) took 14 seconds.
Everything ran smoothly, but it’s often difficult to understand what the parameters’ values mean when analyzing a trace plot or table summary – even more so here, as the parameters live in the standardized space. A useful thing to understand your models is… you guessed it: posterior predictive checks! We’ll use PyMC’s dedicated function to sample data from the posterior. This function will randomly draw 4000 samples of parameters from the trace. Then, for each sample, it will draw 100 random numbers from a normal distribution specified by the values of mu
and sigma
in that sample:
with model_1:
pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=rng)
Sampling: [obs]
Now, the posterior_predictive group in idata
contains 4000 generated data sets (containing 100 samples each), each using a different parameter setting from the posterior:
idata.posterior_predictive
<xarray.Dataset> Size: 3MB Dimensions: (chain: 4, draw: 1000, obs_dim_2: 100) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999 * obs_dim_2 (obs_dim_2) int64 800B 0 1 2 3 4 5 6 7 ... 93 94 95 96 97 98 99 Data variables: obs (chain, draw, obs_dim_2) float64 3MB -0.5997 0.312 ... 0.4695 Attributes: created_at: 2024-06-25T12:59:45.204631 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608
One common way to visualize is to look if the model can reproduce the patterns observed in the real data. ArviZ has a really neat function to do that out of the box:
az.plot_ppc(idata, num_pp_samples=100);
It looks like the model is pretty good at retrodicting the data. In addition to this generic function, it’s always nice to make a plot tailored to your use-case. Here, it would be interesting to plot the predicted relationship between the predictor and the outcome. This is quite easy, now that we already sampled posterior predictive samples – we just have to push the parameters through the model:
post = idata.posterior
mu_pp = post["a"] + post["b"] * xr.DataArray(predictor_scaled, dims=["obs_id"])
_, ax = plt.subplots()
ax.plot(
predictor_scaled, mu_pp.mean(("chain", "draw")), label="Mean outcome", color="C1", alpha=0.6
)
ax.scatter(predictor_scaled, idata.observed_data["obs"])
az.plot_hdi(predictor_scaled, idata.posterior_predictive["obs"])
ax.set_xlabel("Predictor (stdz)")
ax.set_ylabel("Outcome (stdz)");
We have a lot of data, so the uncertainty around the mean of the outcome is pretty narrow; but the uncertainty surrounding the outcome in general seems quite in line with the observed data.
Comparison between PPC and other model evaluation methods.#
An excellent introduction to this was given in the Edward documentation:
PPCs are an excellent tool for revising models, simplifying or expanding the current model as one examines how well it fits the data. They are inspired by prior checks and classical hypothesis testing, under the philosophy that models should be criticized under the frequentist perspective of large sample assessment.
PPCs can also be applied to tasks such as hypothesis testing, model comparison, model selection, and model averaging. It’s important to note that while they can be applied as a form of Bayesian hypothesis testing, hypothesis testing is generally not recommended: binary decision making from a single test is not as common a use case as one might believe. We recommend performing many PPCs to get a holistic understanding of the model fit.
Prediction#
The same pattern can be used for prediction. Here, we are building a logistic regression model:
N = 400
true_intercept = 0.2
true_slope = 1.0
predictors = rng.normal(size=N)
true_p = logistic(true_intercept + true_slope * predictors)
outcomes = rng.binomial(1, true_p)
outcomes[:10]
array([1, 1, 1, 0, 1, 0, 0, 1, 1, 0])
with pm.Model() as model_2:
betas = pm.Normal("betas", mu=0.0, sigma=np.array([0.5, 1.0]), shape=2)
# set predictors as shared variable to change them for PPCs:
pred = pm.MutableData("pred", predictors, dims="obs_id")
p = pm.Deterministic("p", pm.math.invlogit(betas[0] + betas[1] * pred), dims="obs_id")
outcome = pm.Bernoulli("outcome", p=p, observed=outcomes, dims="obs_id")
idata_2 = pm.sample(1000, tune=2000, return_inferencedata=True, random_seed=rng)
az.summary(idata_2, var_names=["betas"], round_to=2)
/home/ricardo/Documents/Projects/pymc/pymc/data.py:304: FutureWarning: MutableData is deprecated. All Data variables are now mutable. Use Data instead.
warnings.warn(
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [betas]
Sampling 4 chains for 2_000 tune and 1_000 draw iterations (8_000 + 4_000 draws total) took 6 seconds.
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
betas[0] | 0.23 | 0.11 | 0.03 | 0.44 | 0.0 | 0.0 | 3211.49 | 3013.30 | 1.0 |
betas[1] | 1.03 | 0.13 | 0.78 | 1.29 | 0.0 | 0.0 | 3673.85 | 2720.49 | 1.0 |
Now, let’s simulate out-of-sample data to see how the model predicts them. We’ll give the new predictors to the model and it’ll then tell us what it thinks the outcomes are, based on what it learned in the training round. We’ll then compare the model’s predictions to the true out-of-sample outcomes.
predictors_out_of_sample = rng.normal(size=50)
outcomes_out_of_sample = rng.binomial(
1, logistic(true_intercept + true_slope * predictors_out_of_sample)
)
with model_2:
# update values of predictors:
pm.set_data({"pred": predictors_out_of_sample})
# use the updated values and predict outcomes and probabilities:
idata_2 = pm.sample_posterior_predictive(
idata_2,
var_names=["p"],
return_inferencedata=True,
predictions=True,
extend_inferencedata=True,
random_seed=rng,
)
Sampling: []
idata_2
-
<xarray.Dataset> Size: 13MB Dimensions: (chain: 4, draw: 1000, betas_dim_0: 2, obs_id: 400) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999 * betas_dim_0 (betas_dim_0) int64 16B 0 1 * obs_id (obs_id) int64 3kB 0 1 2 3 4 5 6 ... 394 395 396 397 398 399 Data variables: betas (chain, draw, betas_dim_0) float64 64kB 0.3311 0.9692 ... 1.113 p (chain, draw, obs_id) float64 13MB 0.5169 0.7004 ... 0.8773 Attributes: created_at: 2024-06-25T12:59:58.670730 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608 sampling_time: 6.474128246307373 tuning_steps: 2000
-
<xarray.Dataset> Size: 2MB Dimensions: (chain: 4, draw: 1000, obs_id: 50) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999 * obs_id (obs_id) int64 400B 0 1 2 3 4 5 6 7 8 ... 42 43 44 45 46 47 48 49 Data variables: p (chain, draw, obs_id) float64 2MB 0.5904 0.2295 ... 0.3397 0.5857 Attributes: created_at: 2024-06-25T12:59:59.047195 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608
-
<xarray.Dataset> Size: 496kB Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 ... 995 996 997 998 999 Data variables: (12/17) acceptance_rate (chain, draw) float64 32kB 0.8535 0.6245 ... 0.9594 energy (chain, draw) float64 32kB 239.0 238.5 ... 236.7 step_size_bar (chain, draw) float64 32kB 1.181 1.181 ... 1.194 perf_counter_start (chain, draw) float64 32kB 1.238e+04 ... 1.238e+04 smallest_eigval (chain, draw) float64 32kB nan nan nan ... nan nan reached_max_treedepth (chain, draw) bool 4kB False False ... False False ... ... diverging (chain, draw) bool 4kB False False ... False False energy_error (chain, draw) float64 32kB -0.5477 ... 0.004425 tree_depth (chain, draw) int64 32kB 2 2 2 2 2 2 ... 2 2 2 2 2 2 process_time_diff (chain, draw) float64 32kB 0.001024 ... 0.0008914 lp (chain, draw) float64 32kB -236.9 -237.8 ... -236.5 perf_counter_diff (chain, draw) float64 32kB 0.001023 ... 0.0008892 Attributes: created_at: 2024-06-25T12:59:58.698238 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608 sampling_time: 6.474128246307373 tuning_steps: 2000
-
<xarray.Dataset> Size: 6kB Dimensions: (obs_id: 400) Coordinates: * obs_id (obs_id) int64 3kB 0 1 2 3 4 5 6 7 ... 393 394 395 396 397 398 399 Data variables: outcome (obs_id) int64 3kB 1 1 1 0 1 0 0 1 1 0 0 ... 0 1 1 1 0 1 1 0 1 0 1 Attributes: created_at: 2024-06-25T12:59:58.707843 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608
-
<xarray.Dataset> Size: 6kB Dimensions: (obs_id: 400) Coordinates: * obs_id (obs_id) int64 3kB 0 1 2 3 4 5 6 7 ... 393 394 395 396 397 398 399 Data variables: pred (obs_id) float64 3kB -0.2718 0.5346 -1.073 ... -0.9459 -1.438 1.557 Attributes: created_at: 2024-06-25T12:59:58.709527 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608
-
<xarray.Dataset> Size: 800B Dimensions: (obs_id: 50) Coordinates: * obs_id (obs_id) int64 400B 0 1 2 3 4 5 6 7 8 ... 42 43 44 45 46 47 48 49 Data variables: pred (obs_id) float64 400B 0.03558 -1.591 -0.7009 ... -0.8064 0.1015 Attributes: created_at: 2024-06-25T12:59:59.049869 arviz_version: 0.17.1 inference_library: pymc inference_library_version: 5.15.0+1.g58927d608
Mean predicted values plus error bars to give a sense of uncertainty in prediction#
Note that since we are dealing with the full posterior, we are also getting uncertainty in our predictions for free.
_, ax = plt.subplots(figsize=(12, 6))
preds_out_of_sample = idata_2.predictions_constant_data.sortby("pred")["pred"]
model_preds = idata_2.predictions.sortby(preds_out_of_sample)
# uncertainty about the estimates:
ax.vlines(
preds_out_of_sample,
*az.hdi(model_preds)["p"].transpose("hdi", ...),
alpha=0.8,
)
# expected probability of success:
ax.plot(
preds_out_of_sample,
model_preds["p"].mean(("chain", "draw")),
"o",
ms=5,
color="C1",
alpha=0.8,
label="Expected prob.",
)
# actual outcomes:
ax.scatter(
x=predictors_out_of_sample,
y=outcomes_out_of_sample,
marker="x",
color="k",
alpha=0.8,
label="Observed outcomes",
)
# true probabilities:
x = np.linspace(predictors_out_of_sample.min() - 0.1, predictors_out_of_sample.max() + 0.1)
ax.plot(
x,
logistic(true_intercept + true_slope * x),
lw=2,
ls="--",
color="#565C6C",
alpha=0.8,
label="True prob.",
)
ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of success")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);
%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor
Last updated: Tue Jun 25 2024
Python implementation: CPython
Python version : 3.11.8
IPython version : 8.22.2
pytensor: 2.20.0+3.g66439d283.dirty
pymc : 5.15.0+1.g58927d608
numpy : 1.26.4
arviz : 0.17.1
matplotlib: 3.8.3
xarray : 2024.2.0
Watermark: 2.4.3