GLM-ordinal-features#
Ordinal Exogenous Feature: Worked Example with Bayesian Workflow#
Here we use an ordinal exogenous predictor feature within a model:
y ~ x + e
y: Numeric
x: Ordinal Category
… this is in contrast to estimating an ordinal endogenous target feature, which we show in another notebook
Disclaimer:
This Notebook is a worked example only, it’s not intended to be an academic reference
The theory and math may be incorrect, incorrectly notated, or incorrectly used
The code may contain errors, inefficiencies, legacy methods, and the text may have typos
Use at your own risk!
Contents#
Discussion#
Problem Statement#
Human action and economics is all about expressing our ordinal preferences between limited options in the real-world.
We often encounter real-world situations and datasets where a predictor feature is an ordinal category recording a preference or summarizing a metric value, and is particularly common in insurance and health. For example:
As a totally subjective opinion which can be different between observations (e.g.
["bad", "medium", "good", "better", "way better", "best", "actually the best"]
) - these are difficult to work with and a symptom of poor data designOn a subjective but standardized scale (e.g.
["strongly disagree", "disagree", "agree", "strongly agree"]
) this is the approach of the familar Likert scaleAs a summary binning of a real objective value on a metric scale (e.g. binning ages into age-groups
["<30", "30 to 60", "60+"]
), or a subjective value that’s been mapped to a metric scale (e.g. medical health self-scoring["0-10%", ..., "90-100%"]
) - these are typically a misuse of the metric because the data has been compressed (losing information), and the reason for the binning and the choices of bin-edges are usually not known
In all these cases the critical issue is that the categorical values and their ordinal rank doesn’t necessarily relate
linearly to the target variable. For example in a 4-value Likert scale (shown above) the relative effect of
"strongly agree"
(rank 4
) is probably not "agree"
(rank 3) plus 1: 3+1 != 4
.
Another way to put it is the metric distance between ordinal categories is not known and can be unequal. For example in
that 4-value Likert scale (shown above) the difference between "disagree" vs "agree"
is probably not the same as
between "agree" vs "strongly agree"
.
These properties can unfortunately encourage modellers to incorporate ordinal features as either a categorical (with infinite degrees of freedom - so we lose ordering / rank information), or as a numeric coefficient (which ignores the unequal spacing, non-linear response). Both are poor choices and have subtly negative effects on the model performance.
A final nuance is that we might not see the occurence of all valid categorial ordinal levels in the training dataset.
For example we might know a range is measured ["c0", "c1", "c2", "c3"]
but only see ["c0", "c1", "c3"]
. This is a
missing data problem which could further encourage the misuse of a numeric coefficient to average or “interpolate” a
value. What we should do is incorporate our knowledge of the data domain into the model structure to autoimpute a
coefficient value. This is actually the case in this dataset here (see Section 0)!
Data & Models Demonstrated#
Our problem statement is that when faced with such ordinal features we want to:
Infer a series of prior allocators that transform the ordinal categories into a linear (polynomial) scale
Predict the endogenous feature as usual, having captured the information from the ordinals
This notebook takes the opportunity to:
Demonstrate a general method using a constrained Dirichlet prior, based on [Bürkner P, 2018] and demonstrated in a pymc-specific example by Austin Rochford [Rochford, n.d.]
Improve upon both those methods by structurally correcting for a missing level value in an ordinal feature
Demonstrate a reasonably complete Bayesian workflow [Gelman et al., 2020] including data curation and grabbing data from an RDataset
This notebook is a partner to another notebook (TBD) where we estimate an ordinal endogenous target feature.
Setup#
Attention
This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.
Extra dependencies install instructions
In order to run this notebook (either locally or on binder) you won’t only need a working PyMC installation with all optional dependencies, but also to install some extra dependencies. For advise on installing PyMC itself, please refer to Installation
You can install these dependencies with your preferred package manager, we provide as an example the pip and conda commands below.
$ pip install
Note that if you want (or need) to install the packages from inside the notebook instead of the command line, you can install the packages by running a variation of the pip command:
import sys
!{sys.executable} -m pip install
You should not run !pip install
as it might install the package in a different
environment and not be available from the Jupyter notebook even if installed.
Another alternative is using conda instead:
$ conda install
when installing scientific python packages with conda, we recommend using conda forge
# uncomment to install in a Google Colab environment
# !pip install pyreadr watermark
from copy import deepcopy
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pyreadr
import pytensor.tensor as pt
from matplotlib import gridspec
from pymc.testing import assert_no_rvs
import warnings # isort:skip # suppress seaborn, it's far too chatty
warnings.simplefilter(action="ignore", category=FutureWarning) # isort:skip
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
sns.set_theme(
style="darkgrid",
palette="muted",
context="notebook",
rc={"figure.dpi": 72, "savefig.dpi": 144, "figure.figsize": (12, 4)},
)
# set target_accept quite high to minimise divergences in mdlb
SAMPLE_KWS = dict(
progressbar=True,
draws=500,
tune=2000,
chains=4,
idata_kwargs=dict(log_likelihood=True),
)
USE_LOCAL_DATASET = False
0. Curate Dataset#
We use the same health dataset as in [Bürkner P, 2018], named ICFCoreSetCWP.RData
, available in an R package
ordPens
Per the Bürkner paper (Section 4: Case Study) this dataset is from a study ** of Chronic Widespread Pain(CWP) wherein
420 patients were self-surveyed on 67 health features (each a subjective ordinal category) and also assigned a
differently generated (and also subjective) measure of physical health. In the dataset these 67 features are named e.g.
b1602
, b102
, … d450
, d455
, … s770
etc, and the target feature is named phcs
.
Per the Bürkner paper we will subselect 2 features d450
, d455
(which measure an impairment of patient
walking ability on a scale [0 to 4]
["no problem"
to "complete problem"
]) and use them to predict phcs
.
Quite interestingly, for feature d450
, the highest ordinal level value 4
is not seen in the dataset, so we have a
missing data problem which could further encourage the misuse of a numeric coefficient to average or “interpolate” a
value. What we should do is incorporate our knowledge of the data domain into the model structure to auto-impute a
coefficient value. This means that our model can make predictions on new data where a d450=4
value might be seen.
** Just for completness (but not needed for this notebook) that study is reported in Gertheiss, J., Hogger, S., Oberhauser, C., & Tutz, G. (2011). Selection of ordinally 784 scaled independent variables with applications to international classification of functioning 785 core sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 60 (3), 786 377–395.
NOTE some boilerplate steps are included but ~~struck through~~ with and explanatory comment e.g. “Not needed in this simple example”. This is to preserve the logical workflow which is more generally useful
0.1 Extract#
Annoyingly but not suprisingly for an R project, despite being a small, simple table, the dataset is only available in
an obscure R binary format, and tarred, so we’ll download, unpack and store locally as a normal CSV file.
This uses the rather helpful pyreadr
package.
if USE_LOCAL_DATASET:
dfr = pd.read_csv("icf_core_set_cwp.csv", index_col="rownames")
else:
import os
import tarfile
import urllib.request
url = "https://cran.r-project.org/src/contrib/ordPens_1.1.0.tar.gz"
filehandle, _ = urllib.request.urlretrieve(url)
rbytes = tarfile.open(filehandle).extractfile(member="ordPens/data/ICFCoreSetCWP.RData").read()
fn = "ICFCoreSetCWP.RData"
with open(fn, "wb") as f:
f.write(rbytes)
dfr = pyreadr.read_r(fn)["ICFCoreSetCWP"]
os.remove(fn)
dfr.to_csv("icf_core_set_cwp.csv")
print(dfr.shape)
display(pd.concat((dfr.describe(include="all").T, dfr.isnull().sum(), dfr.dtypes), axis=1))
display(dfr.head())
(420, 68)
count | mean | std | min | 25% | 50% | 75% | max | 0 | 1 | |
---|---|---|---|---|---|---|---|---|---|---|
b1602 | 420.0 | 0.542857 | 0.901111 | 0.00 | 0.00 | 0.000 | 1.0000 | 3.00 | 0 | int32 |
b122 | 420.0 | 0.533333 | 0.871908 | 0.00 | 0.00 | 0.000 | 1.0000 | 4.00 | 0 | int32 |
b126 | 420.0 | 0.866667 | 0.982585 | 0.00 | 0.00 | 1.000 | 2.0000 | 3.00 | 0 | int32 |
b130 | 420.0 | 1.180952 | 1.075045 | 0.00 | 0.00 | 1.000 | 2.0000 | 4.00 | 0 | int32 |
b134 | 420.0 | 1.352381 | 1.147632 | 0.00 | 0.00 | 1.000 | 2.0000 | 4.00 | 0 | int32 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
e575 | 420.0 | 0.350000 | 1.466857 | -4.00 | 0.00 | 0.000 | 1.0000 | 4.00 | 0 | int32 |
e580 | 420.0 | 1.080952 | 1.648929 | -4.00 | 0.00 | 1.000 | 2.0000 | 4.00 | 0 | int32 |
e590 | 420.0 | 0.190476 | 1.391040 | -4.00 | 0.00 | 0.000 | 0.0000 | 4.00 | 0 | int32 |
s770 | 420.0 | 0.890476 | 1.023544 | 0.00 | 0.00 | 1.000 | 2.0000 | 4.00 | 0 | int32 |
phcs | 420.0 | 32.407548 | 8.167084 | 10.08 | 26.58 | 31.865 | 37.5475 | 53.17 | 0 | float64 |
68 rows × 10 columns
b1602 | b122 | b126 | b130 | b134 | b140 | b147 | b152 | b164 | b180 | ... | e450 | e455 | e460 | e465 | e570 | e575 | e580 | e590 | s770 | phcs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rownames | |||||||||||||||||||||
1 | 0 | 1 | 2 | 1 | 1 | 0 | 0 | 2 | 0 | 0 | ... | 4 | 0 | 0 | 0 | 3 | 3 | 4 | 3 | 0 | 44.33 |
2 | 3 | 2 | 2 | 3 | 3 | 2 | 3 | 3 | 3 | 1 | ... | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 21.09 |
3 | 0 | 1 | 2 | 1 | 1 | 0 | 1 | 2 | 0 | 0 | ... | 4 | 0 | 0 | 0 | 3 | 3 | 4 | 0 | 0 | 41.74 |
4 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 2 | 2 | -1 | 0 | 0 | 2 | 2 | 1 | 1 | 33.96 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 46.29 |
5 rows × 68 columns
Observe:
Looks okay - if this was a proper project we’d want to know what those cryptic column headings actually mean
For this purpose we’ll only use a couple of the features [
d450
,d455
] and will press ahead
0.2 Clean#
fts_austin = ["d450", "d455", "phcs"]
df = dfr[fts_austin].copy()
display(pd.concat((df.describe(include="all").T, df.isnull().sum(), df.dtypes), axis=1))
df.head()
count | mean | std | min | 25% | 50% | 75% | max | 0 | 1 | |
---|---|---|---|---|---|---|---|---|---|---|
d450 | 420.0 | 0.940476 | 0.998223 | 0.00 | 0.00 | 1.000 | 2.0000 | 3.00 | 0 | int32 |
d455 | 420.0 | 1.585714 | 1.224174 | 0.00 | 1.00 | 2.000 | 2.0000 | 4.00 | 0 | int32 |
phcs | 420.0 | 32.407548 | 8.167084 | 10.08 | 26.58 | 31.865 | 37.5475 | 53.17 | 0 | float64 |
d450 | d455 | phcs | |
---|---|---|---|
rownames | |||
1 | 0 | 2 | 44.33 |
2 | 3 | 3 | 21.09 |
3 | 0 | 2 | 41.74 |
4 | 3 | 2 | 33.96 |
5 | 0 | 0 | 46.29 |
~~0.2.1 Clean Observations~~#
Not needed in this simple example
0.2.2 Clean Features#
~~0.2.2.1 Rename Features~~#
Nothing really needed, will rename the index when we force dtype and set index
~~0.2.2.2 Correct Features~~#
Not needed in this simple example
0.2.2.3 Force Datatypes#
Force d450
to string representation and ordered categorical dtype (supplied as an int which is unhelpful)#
ft = "d450"
idx = df[ft].notnull()
df.loc[idx, ft] = df.loc[idx, ft].apply(lambda x: f"c{x}")
df[ft].unique()
array(['c0', 'c3', 'c1', 'c2'], dtype=object)
NOTE force the categorical levels to include c4 which is valid in the data domain but unobserved
lvls = ["c0", "c1", "c2", "c3", "c4"]
df[ft] = pd.Categorical(df[ft].values, categories=lvls, ordered=True)
df[ft].cat.categories
Index(['c0', 'c1', 'c2', 'c3', 'c4'], dtype='object')
Force d455
to string representation and ordered categorical dtype (supplied as an int which is unhelpful)#
ft = "d455"
idx = df[ft].notnull()
df.loc[idx, ft] = df.loc[idx, ft].apply(lambda x: f"c{x}")
df[ft].unique()
array(['c2', 'c3', 'c0', 'c1', 'c4'], dtype=object)
lvls = ["c0", "c1", "c2", "c3", "c4"]
df[ft] = pd.Categorical(df[ft].values, categories=lvls, ordered=True)
df[ft].cat.categories
Index(['c0', 'c1', 'c2', 'c3', 'c4'], dtype='object')
0.2.2.4 Create and set indexes#
df0 = df.reset_index()
df0["oid"] = df0["rownames"].apply(lambda n: f"o{str(n).zfill(3)}")
df = df0.drop("rownames", axis=1).set_index("oid").sort_index()
assert df.index.is_unique
0.3 Very limited quick EDA#
print(df.shape)
display(pd.concat((df.describe(include="all").T, df.isnull().sum(), df.dtypes), axis=1))
display(df.head())
(420, 3)
count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | 0 | 1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
d450 | 420 | 4 | c0 | 189 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | category |
d455 | 420 | 5 | c2 | 113 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | category |
phcs | 420.0 | NaN | NaN | NaN | 32.407548 | 8.167084 | 10.08 | 26.58 | 31.865 | 37.5475 | 53.17 | 0 | float64 |
d450 | d455 | phcs | |
---|---|---|---|
oid | |||
o001 | c0 | c2 | 44.33 |
o002 | c3 | c3 | 21.09 |
o003 | c0 | c2 | 41.74 |
o004 | c3 | c2 | 33.96 |
o005 | c0 | c0 | 46.29 |
0.3.1 Univariate target phcs
#
fts = ["phcs"]
v_kws = dict(data=df, cut=0)
cs = sns.color_palette(n_colors=len(fts))
f, axs = plt.subplots(len(fts), 1, figsize=(12, 1 + 1.4 * len(fts)), squeeze=False)
for i, ft in enumerate(fts):
ax = sns.violinplot(x=ft, **v_kws, ax=axs[0][i], color=cs[i])
_ = ax.set_title(ft)
_ = f.suptitle("Univariate numerics")
_ = f.tight_layout()
Observe:
phcs
is a subjective scored measure of physical healt, see [Bürkner P, 2018] for detailsSeems well-behaved, unimodal, smooth
0.3.2 Target phcs
vs ['d450', 'd455']
#
def plot_numeric_vs_cat(df, ftnum="phcs", ftcat="d450") -> plt.figure:
f = plt.figure(figsize=(12, 3))
gs = gridspec.GridSpec(1, 2, width_ratios=[4, 1], figure=f)
ax0 = f.add_subplot(gs[0])
ax1 = f.add_subplot(gs[1], sharey=ax0)
_ = ax0.set(title=f"Distribution of `{ftnum}` wihin each `{ftcat}` category")
_ = ax1.set(title=f"Count obs per `{ftcat}` category", ylabel=False)
kws_box = dict(
orient="h",
showmeans=True,
whis=(3, 97),
meanprops=dict(markerfacecolor="w", markeredgecolor="#333333", marker="d", markersize=12),
)
_ = sns.boxplot(x=ftnum, y=ftcat, data=df, **kws_box, ax=ax0) # hue=ftcat,
_ = sns.countplot(y=ftcat, data=df, ax=ax1) # hue=ftcat, seaborn >= 0.13
_ = ax0.invert_yaxis()
_ = ax1.yaxis.label.set_visible(False)
_ = plt.setp(ax1.get_yticklabels(), visible=False)
_ = f.suptitle(f"Summary stats for {len(df)} obs in dataset")
_ = f.tight_layout()
f = plot_numeric_vs_cat(df, ftnum="phcs", ftcat="d450")
f = plot_numeric_vs_cat(df, ftnum="phcs", ftcat="d455")
Observe:
phcs vs d450
:
c0
wider and higher: possibly a catch-all category because heavily observed tooc3
fewer countsc4
is not observed: it’s missing from the data despite being valid in the data domain
phcs vs d455
:
c1
andc2
look very similarc4
fewer counts
0.4 Transform dataset to dfx
for model input#
IMPORTANT NOTE
Reminder that Bayesian inferential methods do not need a
test
dataset (nor k-fold cross validation) to fit parameters. We also do not need aholdout
(out-of-sample) dataset to evaluate model performance, because we can use in-sample PPC, LOO-PIT and ELPD evaluationsSo we use the entire dataset
df
as our model inputDepending on the real-world model implementation we might:
Separate out a
holdout
set (even though we dont need it) to eyeball the predictive outputs, but here we have a summarized dataset, so this isn’t possible nor suitableCreate a
forecast
set to demonstrate how we would use the model and it’s predictive outputs in Production.
NOTE:
This is an abbreviated / simplified transformation process which still allows for the potential to add more features in future
Map ordinal categorical to an ordinal numeric (int) based on its preexisting categorical order
df["d450_idx"] = df["d450"].cat.codes.astype(int)
df["d450_num"] = df["d450_idx"].copy()
df["d455_idx"] = df["d455"].cat.codes.astype(int)
df["d455_num"] = df["d455_idx"].copy()
Transform (zscore and scale) numerics
fts_num = ["d450_num", "d455_num"]
fts_non_num = ["d450_idx", "d455_idx"]
fts_y = ["phcs"]
MNS = np.nanmean(df[fts_num], axis=0)
SDEVS = np.nanstd(df[fts_num], axis=0)
dfx_num = (df[fts_num] - MNS) / SDEVS
icpt = pd.Series(np.ones(len(df)), name="intercept", index=dfx_num.index)
# concat including y_idx which will be used as observed
dfx = pd.concat((df[fts_y], icpt, df[fts_non_num], dfx_num), axis=1)
dfx.sample(5, random_state=42)
phcs | intercept | d450_idx | d455_idx | d450_num | d455_num | |
---|---|---|---|---|---|---|
oid | ||||||
o152 | 33.40 | 1.0 | 0 | 0 | -0.943274 | -1.296879 |
o351 | 45.42 | 1.0 | 1 | 1 | 0.059701 | -0.479027 |
o185 | 31.40 | 1.0 | 2 | 3 | 1.062676 | 1.156676 |
o395 | 29.01 | 1.0 | 2 | 2 | 1.062676 | 0.338824 |
o449 | 23.32 | 1.0 | 2 | 0 | 1.062676 | -1.296879 |
0.5 Create forecast
set and convert to dffx
for model input#
NOTE: We depart from the datasets used in [Rochford, n.d.] and [Bürkner P, 2018] to make sure our
forecast
dataset contains all valid levels of d450
and d455
. Specifically, the observed dataset does not contain
the domain-valid d450 = 4
, so we will make sure that our forecast set does include it.
LVLS_D450_D455 = ["c0", "c1", "c2", "c3", "c4"]
dff = pd.merge(
pd.Series(LVLS_D450_D455, name="d450"), pd.Series(LVLS_D450_D455, name="d455"), how="cross"
)
dff["d450"] = pd.Categorical(dff["d450"].values, categories=LVLS_D450_D455, ordered=True)
dff["d455"] = pd.Categorical(dff["d455"].values, categories=LVLS_D450_D455, ordered=True)
dff["phcs"] = 0.0
dff["oid"] = [f"o{str(n).zfill(3)}" for n in range(len(dff))]
dff.set_index("oid", inplace=True)
dff.head()
d450 | d455 | phcs | |
---|---|---|---|
oid | |||
o000 | c0 | c0 | 0.0 |
o001 | c0 | c1 | 0.0 |
o002 | c0 | c2 | 0.0 |
o003 | c0 | c3 | 0.0 |
o004 | c0 | c4 | 0.0 |
Convert to dffx
for model input
dff["d450_idx"] = dff["d450"].cat.codes.astype(int)
dff["d455_idx"] = dff["d455"].cat.codes.astype(int)
dff["d450_num"] = dff["d450_idx"].copy()
dff["d455_num"] = dff["d455_idx"].copy()
dffx_num = (dff[fts_num] - MNS) / SDEVS
icptf = pd.Series(np.ones(len(dff)), name="intercept", index=dffx_num.index)
# # concat including y_idx which will be used as observed
dffx = pd.concat((dff[fts_y], icptf, dff[fts_non_num], dffx_num), axis=1)
dffx.sample(5, random_state=42)
phcs | intercept | d450_idx | d455_idx | d450_num | d455_num | |
---|---|---|---|---|---|---|
oid | ||||||
o008 | 0.0 | 1.0 | 1 | 3 | 0.059701 | 1.156676 |
o016 | 0.0 | 1.0 | 3 | 1 | 2.065651 | -0.479027 |
o000 | 0.0 | 1.0 | 0 | 0 | -0.943274 | -1.296879 |
o023 | 0.0 | 1.0 | 4 | 3 | 3.068627 | 1.156676 |
o011 | 0.0 | 1.0 | 2 | 1 | 1.062676 | -0.479027 |
1. Model A: The Wrong Way - Simple Linear Coefficients#
This is a simple linear model where we acknowledge that the categorical features are ordered, but ignore that the impact of the factor-values on the coefficient might not be equally-spaced, and instead just assume equal spacing:
where:
Observations \(i\) contain numeric features \(j\), and \(\hat{y_{i}}\) is our estimate, here of
phcs
The linear sub-model \(\beta^{T}\mathbb{x}_{ij}\) lets us regress onto those features
Notably:
\(\mathbb{x}_{i,d450}\) is treated as a numeric feature
\(\mathbb{x}_{i,d455}\) is treated as a numeric feature
1.1 Build Model Object#
ft_y = "phcs"
fts_x = ["intercept", "d450_num", "d455_num"]
COORDS = dict(oid=dfx.index.values, y_nm=ft_y, x_nm=fts_x)
with pm.Model(coords=COORDS) as mdla:
# 0. create (Mutable)Data containers for obs (Y, X)
y = pm.Data("y", dfx[ft_y].values, dims="oid") # (i, )
x = pm.Data("x", dfx[fts_x].values, dims=("oid", "x_nm")) # (i, x)
# 1. define priors for numeric exogs
b_s = pm.InverseGamma("beta_sigma", alpha=11, beta=10) # (1, )
b = pm.Normal("beta", mu=0, sigma=b_s, dims="x_nm") # (x, )
# 2. define likelihood
epsilon = pm.InverseGamma("epsilon", alpha=11, beta=10)
_ = pm.Normal("phcs_hat", mu=pt.dot(x, b.T), sigma=epsilon, observed=y, dims="oid")
RVS_PPC = ["phcs_hat"]
RVS_SIMPLE_COMMON = ["beta_sigma", "beta", "epsilon"]
Verify the built model structure matches our intent, and validate the parameterization#
display(pm.model_to_graphviz(mdla, formatting="plain"))
display(dict(unobserved=mdla.unobserved_RVs, observed=mdla.observed_RVs))
assert_no_rvs(mdla.logp())
mdla.debug(fn="logp", verbose=True)
mdla.debug(fn="random", verbose=True)
{'unobserved': [beta_sigma ~ InverseGamma(11, 10),
beta ~ Normal(0, beta_sigma),
epsilon ~ InverseGamma(11, 10)],
'observed': [phcs_hat ~ Normal(f(beta), epsilon)]}
point={'beta_sigma_log__': array(0.), 'beta': array([0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
point={'beta_sigma_log__': array(0.), 'beta': array([0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
1.2 Sample Prior Predictive, View Diagnostics#
with mdla:
ida = pm.sample_prior_predictive(
var_names=RVS_PPC + RVS_SIMPLE_COMMON,
samples=2000,
return_inferencedata=True,
random_seed=42,
)
Sampling: [beta, beta_sigma, epsilon, phcs_hat]
1.2.1 In-Sample Prior PPC (Retrodictive Check)#
def plot_ppc_retrodictive(idata, group="prior", mdlname="mdla", ynm="y") -> plt.figure:
"""Convenience plot PPC retrodictive KDE"""
f, axs = plt.subplots(1, 1, figsize=(12, 3))
_ = az.plot_ppc(idata, group=group, kind="kde", var_names=RVS_PPC, ax=axs, observed=True)
_ = f.suptitle(f"In-sample {group.title()} PPC Retrodictive KDE on `{ynm}` - `{mdlname}`")
return f
f = plot_ppc_retrodictive(ida, "prior", "mdla", "phcs")
Observe:
Values are wrong as expected, but range is reasonable
1.2.2 Quick look at selected priors#
def plot_posterior(
idata,
group="prior",
rvs=RVS_SIMPLE_COMMON,
coords=None,
mdlname="mdla",
n=1,
nrows=1,
) -> plt.figure:
"""Convenience plot posterior (or prior) KDE"""
m = int(np.ceil(n / nrows))
f, axs = plt.subplots(nrows, m, figsize=(2.4 * m, 0.8 + nrows * 1.4))
_ = az.plot_posterior(idata, group=group, ax=axs, var_names=rvs, coords=coords)
_ = f.suptitle(f"{group.title()} distributions for rvs {rvs} - `{mdlname}")
_ = f.tight_layout()
return f
f = plot_posterior(ida, "prior", rvs=RVS_SIMPLE_COMMON, mdlname="mdla", n=1 + 3 + 1, nrows=1)
Observe:
beta_sigma
,beta: (levels)
,epsilon
all have reasonable prior ranges as specified
1.3 Sample Posterior, View Diagnostics#
1.3.1 Sample Posterior and PPC#
with mdla:
ida.extend(pm.sample(**SAMPLE_KWS), join="right")
ida.extend(
pm.sample_posterior_predictive(trace=ida.posterior, var_names=RVS_PPC),
join="right",
)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_sigma, beta, epsilon]
Sampling 4 chains for 2_000 tune and 500 draw iterations (8_000 + 2_000 draws total) took 2 seconds.
Sampling: [phcs_hat]
1.3.2 Traces#
def plot_traces_and_display_summary(idata, rvs, coords=None, mdlname="mdla") -> plt.figure:
"""Convenience to plot traces and display summary table for rvs"""
_ = az.plot_trace(idata, var_names=rvs, coords=coords, figsize=(12, 1.4 * len(rvs)))
f = plt.gcf()
_ = f.suptitle(f"Posterior traces of {rvs} - `{mdlname}`")
_ = f.tight_layout()
_ = az.plot_energy(idata, fill_alpha=(0.8, 0.6), fill_color=("C0", "C8"), figsize=(12, 1.6))
display(az.summary(idata, var_names=rvs))
return f
f = plot_traces_and_display_summary(ida, rvs=RVS_SIMPLE_COMMON, mdlname="mdla")
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
beta_sigma | 9.709 | 2.129 | 6.372 | 13.737 | 0.041 | 0.032 | 3311.0 | 1415.0 | 1.0 |
beta[intercept] | 32.368 | 0.344 | 31.719 | 32.972 | 0.007 | 0.005 | 2221.0 | 1565.0 | 1.0 |
beta[d450_num] | -2.995 | 0.396 | -3.730 | -2.246 | 0.010 | 0.007 | 1549.0 | 1522.0 | 1.0 |
beta[d455_num] | -1.790 | 0.406 | -2.592 | -1.090 | 0.010 | 0.007 | 1760.0 | 1470.0 | 1.0 |
epsilon | 6.925 | 0.246 | 6.439 | 7.360 | 0.005 | 0.004 | 2189.0 | 1643.0 | 1.0 |
Observe:
Samples well-mixed and well-behaved
ess_bulk
is good,r_hat
is good
Marginal energy | energy transition looks reasonable
E-BFMI > 0.3
so apparently reasonable
1.3.3 In-Sample Posterior PPC (Retrodictive Check)#
f = plot_ppc_retrodictive(ida, "posterior", "mdla", "phcs")
Observe:
In-sample PPC
phcs_hat
tracks the observedphcs
moderately well: slightly overdispersed, perhaps a likelihood with fatter tails would be more appropriate (e.g. StudentT)
1.3.4 In-Sample PPC LOO-PIT#
def plot_loo_pit(idata, mdlname="mdla", y="phcs_hat", y_hat="phcs_hat"):
"""Convenience plot LOO-PIT KDE and ECDF"""
f, axs = plt.subplots(1, 2, figsize=(12, 2.4))
_ = az.plot_loo_pit(idata, y=y, y_hat=y_hat, ax=axs[0])
_ = az.plot_loo_pit(idata, y=y, y_hat=y_hat, ax=axs[1], ecdf=True)
_ = axs[0].set_title(f"Predicted `{y_hat}` LOO-PIT")
_ = axs[1].set_title(f"Predicted `{y_hat}` LOO-PIT cumulative")
_ = f.suptitle(f"In-sample LOO-PIT `{mdlname}`")
_ = f.tight_layout()
return f
f = plot_loo_pit(ida, "mdla")
Observe:
LOO-PIT
looks good, again slightly overdispersed but acceptable for use
~~1.3.5 Compare Log-Likelihood vs Other Models~~#
# Nothing to compare yet
1.4 Evaluate Posterior Parameters#
1.4.1 Univariate#
Lots of parameters, let’s take our time
f = plot_posterior(ida, "posterior", rvs=RVS_SIMPLE_COMMON, mdlname="mdla", n=5, nrows=1)
Observe:
beta_sigma
:E ~ 10
indicates need for high variance in locations ofbeta
sbeta: intercept
:E ~ 32
confirms the bulk of the variance inbeta
s locations is simply due to the intercept offset required to get the zscored values into range ofphcs
, no problembeta: d450_num
:E ~ -3
negative, HDI94 does not span 0, substantial effect, smooth central distribution:Higher values of
d450_num
create a reduction inphcs_hat
beta: d455_num
:E ~ -2
negative, HDI94 does not span 0, substantial effect, smooth central distributionHigher values of
d455_num
create a smaller reduction inphcs_hat
epsilon
:E ~ 7
indicates quite a lot of variance still in the data, not yet handled by a modelled feature
1.5 Create PPC Forecast on simplified forecast
set#
Just for completeness, just compare to Figure 3 in the Bürkner paper and Rochford’s blogpost. Those plots summarize to a mean though, which seems unneccesary - let’s improve it a little with full sample posteriors
Replace dataset with dffx
and rebuild#
COORDS_F = deepcopy(COORDS)
COORDS_F["oid"] = dffx.index.values
mdla.set_data("y", dffx[ft_y].values, coords=COORDS_F)
mdla.set_data("x", dffx[fts_x].values, coords=COORDS_F)
display(pm.model_to_graphviz(mdla, formatting="plain"))
assert_no_rvs(mdla.logp())
mdla.debug(fn="logp", verbose=True)
mdla.debug(fn="random", verbose=True)
point={'beta_sigma_log__': array(0.), 'beta': array([0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
point={'beta_sigma_log__': array(0.), 'beta': array([0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
with mdla:
ida_ppc = pm.sample_posterior_predictive(
trace=ida.posterior, var_names=RVS_PPC, predictions=True
)
Sampling: [phcs_hat]
1.5.2 View Predictions#
def plot_predicted_phcshat_d450_d455(idata, mdlname) -> plt.Figure:
"""Convenience to plot predicted phcs_hat vs d450 and d455"""
phcs_hat = (
az.extract(idata, group="predictions", var_names=["phcs_hat"])
.to_dataframe()
.drop(["chain", "draw"], axis=1)
)
dfppc = pd.merge(
phcs_hat.reset_index(),
dff[["d450", "d455"]].reset_index(),
how="left",
on="oid",
)
kws = dict(
y="phcs_hat",
data=dfppc,
linestyles=":",
estimator="mean",
errorbar=("pi", 94),
dodge=0.2,
)
f, axs = plt.subplots(1, 2, figsize=(12, 6), sharey=True)
_ = sns.pointplot(x="d450", hue="d455", **kws, ax=axs[0], palette="plasma_r")
_ = sns.pointplot(x="d455", hue="d450", **kws, ax=axs[1], palette="viridis_r")
_ = [axs[i].set_title(t) for i, t in enumerate(["d450 x d455", "d455 x d450"])]
_ = f.suptitle(
"Domain specific plot of posterior predicted `phcs_hat`"
+ f" vs `d450` and `d455` - `{mdlname}`"
)
_ = f.tight_layout()
plot_predicted_phcshat_d450_d455(idata=ida_ppc, mdlname="mdla")
Observe:
Compare this to the final plots in [Rochford, n.d.] and Figure 12 in [Bürkner P, 2018]
We see the linear responses and equal spacings of
d450
and ofd455
when treated as numeric valuesWe also see that
mdla
technically can make predictions ford450=c4
which is not seen in the data. However, this prediction is a purely linear extrapolation and although helpful in a sense, could be completely and misleadingly wrong in this model specificationNote here we plot the full posteriors on each datapoint (rather than summarise to a mean) which emphasises the large amount of variance still in the data & model
2. Model B: A Better Way - Dirichlet Hyperprior Allocator#
This is an improved linear model where we acknowledge that the categorical features are ordinal and allow the ordinal
values to have a non-equal spacing, For example, it might well be that A > B > C
, but the spacing is not metric:
instead A >>> B > C
. We achieve this using a Dirichlet hyperprior to allocate hetrogenously spaced sections of a
linear coefficient:
where:
Observations \(i\) contain numeric features \(j\) and ordinal categorical features \(k\) (here
d450, d455
) which each have factor value levels \(k_{d450}, k_{d455}\) and note per Section 0, these are both in range[0 - 4]
as recorded by notebook variableLVLS_D450_D455
\(\hat{y_{i}}\) is our estimate, here of
phcs
The linear sub-model \(lm = \beta^{T}\mathbb{x}_{i,j} + \nu_{d450}[x_{i,d450}] + \nu_{d455}[x_{i,d455}]\) lets us regress onto those features
Notably:
\(\mathbb{x}_{i,d450}\) is treated as an ordinal feature and used to index \(\nu_{d450}[x_{i,d450}]\)
\(\mathbb{x}_{i,d455}\) is treated as an ordinal feature and used to index \(\nu_{d455}[x_{i,d455}]\)
NOTE: The above spec is not particuarly optimised / vectorised / DRY to aid explanation
2.1 Build Model Object#
ft_y = "phcs"
fts_x = ["intercept"]
# NOTE fts_ord = ['d450_idx', 'd455_idx']
COORDS = dict(
oid=dfx.index.values,
y_nm=ft_y,
x_nm=fts_x,
d450_nm=LVLS_D450_D455, # not list(df[d450].cat.categories) because c4 missing
d455_nm=list(df["d455"].cat.categories),
)
with pm.Model(coords=COORDS) as mdlb:
# NOTE: Spec not particuarly optimised / vectorised / DRY to aid explanation
# 0. create (Mutable)Data containers for obs (Y, X)
y = pm.Data("y", dfx[ft_y].values, dims="oid") # (i, )
x = pm.Data("x", dfx[fts_x].values, dims=("oid", "x_nm")) # (i, x)
idx_d450 = pm.Data("idx_d450", dfx["d450_idx"].values, dims="oid") # (i, )
idx_d455 = pm.Data("idx_d455", dfx["d455_idx"].values, dims="oid") # (i, )
# 1. define priors for numeric exogs
b_s = pm.InverseGamma("beta_sigma", alpha=11, beta=10) # (1, )
b = pm.Normal("beta", mu=0, sigma=b_s, dims="x_nm") # (x, )
# 2. define nu
def _get_nu(nm, dim):
"""Partition continous prior into ordinal chunks"""
b0 = pm.Normal(f"beta_{nm}", mu=0, sigma=b_s) # (1, )
c0 = pm.Dirichlet(f"chi_{nm}", a=np.ones(len(COORDS[dim])), dims=dim) # (lvls, )
return pm.Deterministic(f"nu_{nm}", b0 * c0.cumsum(), dims=dim) # (lvls, )
nu_d450 = _get_nu("d450", "d450_nm")
nu_d455 = _get_nu("d455", "d455_nm")
# 3. define likelihood
epsilon = pm.InverseGamma("epsilon", alpha=11, beta=10)
_ = pm.Normal(
"phcs_hat",
mu=pt.dot(x, b.T) + nu_d450[idx_d450] + nu_d455[idx_d455],
sigma=epsilon,
observed=y,
dims="oid",
)
rvs_simple = RVS_SIMPLE_COMMON + ["beta_d450", "beta_d455"]
rvs_d450 = ["chi_d450", "nu_d450"]
rvs_d455 = ["chi_d455", "nu_d455"]
Verify the built model structure matches our intent, and validate the parameterization#
display(pm.model_to_graphviz(mdlb, formatting="plain"))
display(dict(unobserved=mdlb.unobserved_RVs, observed=mdlb.observed_RVs))
assert_no_rvs(mdlb.logp())
mdlb.debug(fn="logp", verbose=True)
mdlb.debug(fn="random", verbose=True)
{'unobserved': [beta_sigma ~ InverseGamma(11, 10),
beta ~ Normal(0, beta_sigma),
beta_d450 ~ Normal(0, beta_sigma),
chi_d450 ~ Dirichlet(<constant>),
beta_d455 ~ Normal(0, beta_sigma),
chi_d455 ~ Dirichlet(<constant>),
epsilon ~ InverseGamma(11, 10),
nu_d450 ~ Deterministic(f(chi_d450, beta_d450)),
nu_d455 ~ Deterministic(f(chi_d455, beta_d455))],
'observed': [phcs_hat ~ Normal(f(beta, chi_d455, beta_d455, chi_d450, beta_d450), epsilon)]}
point={'beta_sigma_log__': array(0.), 'beta': array([0.]), 'beta_d450': array(0.), 'chi_d450_simplex__': array([0., 0., 0., 0.]), 'beta_d455': array(0.), 'chi_d455_simplex__': array([0., 0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
point={'beta_sigma_log__': array(0.), 'beta': array([0.]), 'beta_d450': array(0.), 'chi_d450_simplex__': array([0., 0., 0., 0.]), 'beta_d455': array(0.), 'chi_d455_simplex__': array([0., 0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
2.2 Sample Prior Predictive, View Diagnostics#
with mdlb:
idb = pm.sample_prior_predictive(
var_names=RVS_PPC + rvs_simple + rvs_d450 + rvs_d455,
samples=2000,
return_inferencedata=True,
random_seed=42,
)
Sampling: [beta, beta_d450, beta_d455, beta_sigma, chi_d450, chi_d455, epsilon, phcs_hat]
2.2.1 In-Sample Prior PPC (Retrodictive Check)#
f = plot_ppc_retrodictive(idb, "prior", "mdlb", "phcs")
Observe:
Values are wrong as expected, but range is reasonable
2.2.2 Quick look at selected priors#
f = plot_posterior(idb, "prior", rvs=rvs_simple, mdlname="mdlb", n=5, nrows=1)
f = plot_posterior(idb, "prior", rvs=rvs_d450, mdlname="mdlb", n=5 * 2, nrows=2)
f = plot_posterior(idb, "prior", rvs=rvs_d455, mdlname="mdlb", n=5 * 2, nrows=2)
Observe:
Several new parameters!
beta_sigma
,beta: (levels)
,epsilon
: all have reasonable prior ranges as specified*_d450
:chi_*
: obey the simplex constraint of the Dirichlet and span the rangenu_*
: all reasonable as specified, note the ordering already present in the prior
*_d455
:chi_*
: obey the simplex constraint of the Dirichlet and span the rangenu_*
: all reasonable as specified, note the ordering already present in the prior
2.3 Sample Posterior, View Diagnostics#
2.3.1 Sample Posterior and PPC#
SAMPLE_KWS["target_accept"] = 0.9 # raise to mitigate some minor divergences
with mdlb:
idb.extend(pm.sample(**SAMPLE_KWS), join="right")
idb.extend(
pm.sample_posterior_predictive(trace=idb.posterior, var_names=RVS_PPC),
join="right",
)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta_sigma, beta, beta_d450, chi_d450, beta_d455, chi_d455, epsilon]
Sampling 4 chains for 2_000 tune and 500 draw iterations (8_000 + 2_000 draws total) took 17 seconds.
Sampling: [phcs_hat]
2.3.2 Traces#
f = plot_traces_and_display_summary(idb, rvs=rvs_simple + rvs_d450 + rvs_d455, mdlname="mdlb")
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
beta_sigma | 12.471 | 2.620 | 8.293 | 17.221 | 0.081 | 0.058 | 1180.0 | 1215.0 | 1.00 |
beta[intercept] | 40.668 | 2.134 | 37.460 | 44.760 | 0.085 | 0.061 | 758.0 | 577.0 | 1.00 |
epsilon | 6.858 | 0.239 | 6.442 | 7.350 | 0.006 | 0.004 | 1669.0 | 1214.0 | 1.01 |
beta_d450 | -11.531 | 3.144 | -17.884 | -6.711 | 0.111 | 0.082 | 884.0 | 729.0 | 1.01 |
beta_d455 | -7.317 | 2.010 | -11.477 | -4.036 | 0.071 | 0.054 | 957.0 | 676.0 | 1.00 |
chi_d450[c0] | 0.114 | 0.097 | 0.000 | 0.294 | 0.003 | 0.002 | 862.0 | 791.0 | 1.00 |
chi_d450[c1] | 0.392 | 0.116 | 0.175 | 0.601 | 0.003 | 0.002 | 1135.0 | 1254.0 | 1.00 |
chi_d450[c2] | 0.187 | 0.095 | 0.013 | 0.350 | 0.003 | 0.002 | 1194.0 | 723.0 | 1.00 |
chi_d450[c3] | 0.131 | 0.084 | 0.001 | 0.280 | 0.002 | 0.002 | 971.0 | 731.0 | 1.00 |
chi_d450[c4] | 0.177 | 0.141 | 0.000 | 0.443 | 0.004 | 0.003 | 1324.0 | 1140.0 | 1.00 |
nu_d450[c0] | -1.389 | 1.419 | -3.865 | -0.000 | 0.057 | 0.046 | 752.0 | 765.0 | 1.00 |
nu_d450[c1] | -5.658 | 1.708 | -9.063 | -3.088 | 0.061 | 0.049 | 1055.0 | 860.0 | 1.01 |
nu_d450[c2] | -7.688 | 1.691 | -10.774 | -5.055 | 0.059 | 0.046 | 1080.0 | 860.0 | 1.01 |
nu_d450[c3] | -9.194 | 1.880 | -12.915 | -6.119 | 0.065 | 0.050 | 1056.0 | 814.0 | 1.01 |
nu_d450[c4] | -11.531 | 3.144 | -17.884 | -6.711 | 0.111 | 0.082 | 884.0 | 729.0 | 1.01 |
chi_d455[c0] | 0.145 | 0.122 | 0.000 | 0.381 | 0.004 | 0.003 | 821.0 | 817.0 | 1.00 |
chi_d455[c1] | 0.324 | 0.116 | 0.112 | 0.545 | 0.003 | 0.002 | 1326.0 | 932.0 | 1.00 |
chi_d455[c2] | 0.059 | 0.058 | 0.000 | 0.161 | 0.002 | 0.001 | 963.0 | 680.0 | 1.00 |
chi_d455[c3] | 0.310 | 0.133 | 0.073 | 0.552 | 0.004 | 0.003 | 1104.0 | 1133.0 | 1.00 |
chi_d455[c4] | 0.162 | 0.111 | 0.000 | 0.353 | 0.002 | 0.002 | 1744.0 | 1283.0 | 1.00 |
nu_d455[c0] | -1.200 | 1.331 | -3.583 | -0.001 | 0.053 | 0.040 | 751.0 | 744.0 | 1.00 |
nu_d455[c1] | -3.519 | 1.671 | -6.616 | -0.768 | 0.061 | 0.047 | 913.0 | 770.0 | 1.00 |
nu_d455[c2] | -3.921 | 1.667 | -7.013 | -1.082 | 0.060 | 0.047 | 985.0 | 723.0 | 1.00 |
nu_d455[c3] | -6.121 | 1.855 | -9.777 | -3.095 | 0.065 | 0.050 | 986.0 | 740.0 | 1.00 |
nu_d455[c4] | -7.317 | 2.010 | -11.477 | -4.036 | 0.071 | 0.054 | 957.0 | 676.0 | 1.00 |
Observe:
Samples well-mixed and well-behaved, but note we raised
target_accept=0.9
to mitigate / avoid divergences seen at0.8
ess_bulk
a little low,r_hat
is okay
Marginal energy | energy transition looks reasonable
E-BFMI > 0.3
so apparently reasonable
2.3.3 In-Sample Posterior PPC (Retrodictive Check)#
f = plot_ppc_retrodictive(idb, "posterior", "mdlb", "phcs")
Observe:
In-sample PPC
phcs_hat
tracks the observedphcs
moderately well: slightly overdispersed, perhaps a likelihood with fatter tails would be more appropriate (e.g. StudentT)
2.3.4 In-Sample PPC LOO-PIT#
f = plot_loo_pit(idb, "mdlb")
Observe:
LOO-PIT
looks good, again slightly overdispersed but acceptable for use
2.3.5 Compare Log-Likelihood vs Other Models#
def plot_compare_log_likelihood(idata_dict={}, y_hat="phcs_hat") -> plt.figure:
"""Convenience to plot comparison for a dict of idatas"""
dfcomp = az.compare(idata_dict, var_name=y_hat, ic="loo", method="stacking", scale="log")
f, axs = plt.subplots(1, 1, figsize=(12, 2.4 + 0.3 * len(idata_dict)))
_ = az.plot_compare(dfcomp, ax=axs, title=False, textsize=10, legend=False)
_ = f.suptitle(
"Model Performance Comparison: ELPD via In-Sample LOO-PIT "
+ " vs ".join(list(idata_dict.keys()))
+ "\n(higher & narrower is better)"
)
_ = f.tight_layout()
display(dfcomp)
return f
f = plot_compare_log_likelihood(idata_dict={"mdla": ida, "mdlb": idb})
rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
---|---|---|---|---|---|---|---|---|---|
mdlb | 0 | -1412.760828 | 6.526971 | 0.000000 | 0.955269 | 14.869062 | 0.000000 | False | log |
mdla | 1 | -1415.230697 | 3.864788 | 2.469869 | 0.044731 | 14.858536 | 2.309562 | False | log |
Observe:
Our new ordinal-respecting
mdlb
appears to be the winner, taking nearly all the weight and a higherelpd_loo
2.4 Evaluate Posterior Parameters#
2.4.1 Univariate#
Lots of parameters, let’s take our time
f = plot_posterior(idb, "posterior", rvs=rvs_simple, mdlname="mdlb", n=5, nrows=1)
Observe:
beta_sigma
:E ~ 12
indicates need for high variance in locations ofbeta
sbeta: intercept
:E ~ 41
confirms the bulk of the variance inbeta
s locations is simply due to the intercept offset required to get the zscored values into range ofphcs
, no problemepsilon
:E ~ 7
indicates quite a lot of variance still in the data, not yet handled by a modelled featurebeta: d450
:E ~ -12
negative, HDI94 does not span 0, substantial effect, smooth central distribution:Higher indexes of
d450_idx
create a reduction inphcs_hat
beta: d455
:E ~ -7
negative, HDI94 does not span 0, substantial effect, smooth central distributionHigher indexes of
d455_idx
create a reduction inphcs_hat
In general the bigger coefficient values here (vs mdla
) suggest more disrimination between the values in the data and
better performance
f = plot_posterior(idb, "posterior", rvs=rvs_d450, mdlname="mdlb", n=5 * 2, nrows=2)
Observe:
Interesting pattern:
chi_d450
: Non-linear response throughout the rangenu_d450
: The non-linear effectbeta * chi.csum()
is clear, in particularc0
is far from the trend ofc1, c2, c3
Note in particular that the posterior distribution of chi_d450 = c4
is almost exactly the
same value as for its prior, because it hasn’t been evidenced in the dataset. The constraint of the Dirichlet has in
turn scaled the values for chi_d450 = c0 to c3
, the scale of beta_450
, and downstream effects on nu_d450 = c4
.
For comparison you can try the inferior alternative by setting COORDS['d450_nm']=list(df[d450].cat.categories)
in the
model spec in Section 2.1 and re-running and seeing what happens
f = plot_posterior(idb, "posterior", rvs=rvs_d455, mdlname="mdlb", n=5 * 2, nrows=2)
Observe:
Interesting pattern:
chi_d455
: Non-linear response throughout the rangenu_d455
: The non-linear effectbeta * chi.csum()
is clear, in particularc2
is almost the same asc1
Let’s see those levels forestplotted to make even more clear
Monotonic priors forestplot#
def plot_posterior_forest(idata, group="posterior", rv="beta", mdlname="mdla"):
"""Convenience forestplot posterior (or prior) KDE"""
f, axs = plt.subplots(1, 1, figsize=(12, 2.4))
_ = az.plot_forest(idata[group], var_names=[rv], ax=axs, combined=True)
_ = f.suptitle(f"Forestplot of {group.title()} level values for `{rv}` - `{mdlname}`")
_ = f.tight_layout()
return f
f = plot_posterior_forest(idb, "posterior", "nu_d450", "mdlb")
f = plot_posterior_forest(idb, "posterior", "nu_d455", "mdlb")
Observe:
Here we see the same patterns in more detail, in particular:
nu_d450
:c0
is an outlier with disproportionately less impact thanc1, c2, c3
c4
has been auto-imputed and takes the prior value which has very wide variance around a linear extrapolation
nu_d455
:c1, c2
overlap strongly and so have very similar impact to one another
2.5 Create PPC Forecast on simplified forecast
set#
Just for completeness, just compare to Figure 3 in the Bürkner paper and Rochford’s blogpost.
Those plots summarize to a mean though, which seems unneccesary - let’s improve it a little.
2.5.1 Replace dataset with dffx
, rebuild, and sample PPC#
COORDS_F = deepcopy(COORDS)
COORDS_F["oid"] = dffx.index.values
mdlb.set_data("y", dffx[ft_y].values, coords=COORDS_F)
mdlb.set_data("x", dffx[fts_x].values, coords=COORDS_F)
mdlb.set_data("idx_d450", dffx["d450_idx"].values, coords=COORDS_F)
mdlb.set_data("idx_d455", dffx["d455_idx"].values, coords=COORDS_F)
display(pm.model_to_graphviz(mdlb, formatting="plain"))
assert_no_rvs(mdlb.logp())
mdlb.debug(fn="logp", verbose=True)
mdlb.debug(fn="random", verbose=True)
point={'beta_sigma_log__': array(0.), 'beta': array([0.]), 'beta_d450': array(0.), 'chi_d450_simplex__': array([0., 0., 0., 0.]), 'beta_d455': array(0.), 'chi_d455_simplex__': array([0., 0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
point={'beta_sigma_log__': array(0.), 'beta': array([0.]), 'beta_d450': array(0.), 'chi_d450_simplex__': array([0., 0., 0., 0.]), 'beta_d455': array(0.), 'chi_d455_simplex__': array([0., 0., 0., 0.]), 'epsilon_log__': array(0.)}
No problems found
with mdlb:
idb_ppc = pm.sample_posterior_predictive(
trace=idb.posterior, var_names=RVS_PPC, predictions=True
)
Sampling: [phcs_hat]
2.5.2 View Predictions#
plot_predicted_phcshat_d450_d455(idata=idb_ppc, mdlname="mdlb")
Observe:
Compare this to the final plots in [Rochford, n.d.] and Figure 12 in [Bürkner P, 2018]
We see the non-linear responses and non-equal spacings of
d450
and ofd455
when treated as ordinal categoriesIn particular, note the behaviours we already saw in the posterior plots
LHS plot
d450
: all points forc0
are all higher than the plot in Section 1.5.2 (also note the overlap ofd455: c1, c2
levels in the shaded points)RHS plot
d455
: all points forc1, c2
overlap strongly (also noted455 c0
outlying)
We also see that
mdlb
can make predictions ford450=c4
which is not seen in the dataNote here we plot the full posteriors on each datapoint (rather than summarise to a mean) which emphasises the large amount of variance still in the data & model
Errata#
Reference#
Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. Bayesian workflow. arXiv preprint arXiv:2011.01808, 2020. URL: https://arxiv.org/abs/2011.01808.
& Charpentier E Bürkner P. Modeling monotonic effects of ordinal predictors in bayesian regression models. PsyArXiv, 2018. URL: https://doi.org/10.31234/osf.io/9qkhj, doi:doi:10.31234/osf.io/9qkhj.
Austin Rochford. URL: https://austinrochford.com/posts/2018-11-10-monotonic-predictors.html.
Watermark#
# tested running on Google Colab 2024-10-28
%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Mon Nov 04 2024
Python implementation: CPython
Python version : 3.11.9
IPython version : 8.26.0
pyreadr : 0.5.2
pytensor : 2.25.2
numpy : 1.26.4
pymc : 5.16.2
tarfile : 0.9.0
arviz : 0.19.0
matplotlib: 3.9.1
pandas : 2.2.2
seaborn : 0.12.2
Watermark: 2.4.3
License notice#
All the notebooks in this example gallery are provided under the MIT License which allows modification, and redistribution for any use provided the copyright and license notices are preserved.
Citing PyMC examples#
To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.
Important
Many notebooks are adapted from other sources: blogs, books… In such cases you should cite the original source as well.
Also remember to cite the relevant libraries used by your code.
Here is an citation template in bibtex:
@incollection{citekey,
author = "<notebook authors, see above>",
title = "<notebook title>",
editor = "PyMC Team",
booktitle = "PyMC examples",
doi = "10.5281/zenodo.5654871"
}
which once rendered could look like: