Confirmatory Factor Analysis and Structural Equation Models in Psychometrics#
“Evidently, the notions of relevance and dependence are far more basic to human reasoning than the numerical values attached to probability judgments…the language used for representing probabilistic information should allow assertions about dependency relationships to be expressed qualitatively, directly, and explicitly” - Pearl in Probabilistic Reasoning in Intelligent Systems Pearl [1985]
Measurement data is psychometrics is often derived from a strategically constructed survey aimed at a particular target phenomena. Some intuited, but not yet measured, concept that arguably plays a determining role in human action, motivation or sentiment. The relative “fuzziness” of the subject matter in psychometrics has had a catalyzing effect on the methodological rigour sought in the science.
Survey designs are agonized over for correct tone and rhythm of sentence structure. Measurement scales are doubly checked for reliability and correctness. The literature is consulted and questions are refined. Analysis steps are justified and tested under a wealth of modelling routines. Model architectures are defined and refined to better express the hypothesized structures in the data-generating process. We will see how such due diligence leads to powerful and expressive models that grant us tractability on thorny questions of human affect.
Throughout we draw on Roy Levy and Robert J. Mislevy’s excellent Bayesian Psychometric Modeling.
import warnings
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as pt
import seaborn as sns
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=UserWarning)
%config InlineBackend.figure_format = 'retina' # high resolution figures
az.style.use("arviz-darkgrid")
rng = np.random.default_rng(42)
Latent Constructs and Measurement#
Our data is borrowed from work by Boris Mayer and Andrew Ellis found here. They demonstrate CFA and SEM modelling with lavaan.
We have survey responses from ~300 individuals who have answered questions regarding their upbringing, self-efficacy and reported life-satisfaction. The hypothetical dependency structure in this life-satisfaction dataset posits a moderated relationship between scores related to life-satisfaction, parental and family support and self-efficacy. It is not a trivial task to be able to design a survey that can elicit answers plausibly mapped to each of these “factors” or themes, never mind finding a model of their relationship that can inform us as to the relative of impact of each on life-satisfaction outcomes.
First let’s pull out the data and examine some summary statistics.
df = pd.read_csv("../data/sem_data.csv")
df.head()
ID | region | gender | age | se_acad_p1 | se_acad_p2 | se_acad_p3 | se_social_p1 | se_social_p2 | se_social_p3 | sup_friends_p1 | sup_friends_p2 | sup_friends_p3 | sup_parents_p1 | sup_parents_p2 | sup_parents_p3 | ls_p1 | ls_p2 | ls_p3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | west | female | 13 | 4.857143 | 5.571429 | 4.500000 | 5.80 | 5.500000 | 5.40 | 6.5 | 6.5 | 7.0 | 7.0 | 7.0 | 6.0 | 5.333333 | 6.75 | 5.50 |
1 | 2 | west | male | 14 | 4.571429 | 4.285714 | 4.666667 | 5.00 | 5.500000 | 4.80 | 4.5 | 4.5 | 5.5 | 5.0 | 6.0 | 4.5 | 4.333333 | 5.00 | 4.50 |
2 | 10 | west | female | 14 | 4.142857 | 6.142857 | 5.333333 | 5.20 | 4.666667 | 6.00 | 4.0 | 4.5 | 3.5 | 7.0 | 7.0 | 6.5 | 6.333333 | 5.50 | 4.00 |
3 | 11 | west | female | 14 | 5.000000 | 5.428571 | 4.833333 | 6.40 | 5.833333 | 6.40 | 7.0 | 7.0 | 7.0 | 7.0 | 7.0 | 7.0 | 4.333333 | 6.50 | 6.25 |
4 | 12 | west | female | 14 | 5.166667 | 5.600000 | 4.800000 | 5.25 | 5.400000 | 5.25 | 7.0 | 7.0 | 7.0 | 6.5 | 6.5 | 7.0 | 5.666667 | 6.00 | 5.75 |
fig, ax = plt.subplots(figsize=(20, 10))
drivers = [c for c in df.columns if not c in ["region", "gender", "age", "ID"]]
corr_df = df[drivers].corr()
mask = np.triu(np.ones_like(corr_df, dtype=bool))
sns.heatmap(corr_df, annot=True, cmap="Blues", ax=ax, center=0, mask=mask)
ax.set_title("Sample Correlations between indicator Metrics")
fig, ax = plt.subplots(figsize=(20, 10))
sns.heatmap(df[drivers].cov(), annot=True, cmap="Blues", ax=ax, center=0, mask=mask)
ax.set_title("Sample Covariances between indicator Metrics");
The lens here on the sample covariance matrix is common in the traditional SEM modeling. CFA and SEM models are often estimated by fitting parameters to the data by optimising the parameter structure of the covariance matrix. Model assessment routines often gauge the model’s ability to recover the sample covariance relations. There is a slightyly different (less constrained) approach taken in the Bayesian approach to estimating these models which focuses on the observed data rather than the derived summary statistics.
Next we’ll plot the pairplot to visualise the nature of the correlations
ax = sns.pairplot(df[drivers], kind="reg", corner=True, diag_kind="kde")
plt.suptitle("Pair Plot of Indicator Metrics with Regression Fits", fontsize=30);
It’s this wide-ranging set of relationships that we seek to distill in our CFA models. How can we take this complex joint distribution and structure it in a way that is plausible and interpretable?
Measurement Models#
A measurement model is a key component within the more general structural equation model. A measurement model specifies the relationships between observed variables (typically survey items or indicators) and their underlying latent constructs (often referred to as factors or latent variables). We start our presentation of SEM models more generally by focusing on a type of measurement model with it’s own history - the confirmatory factor model (CFA) which specifies a particular structure to the relationships between our indicator variables and the latent factors. It is this structure which is up for confirmation in our modelling.
We’ll start by fitting a “simple” CFA model in PyMC
to demonstrate how the pieces fit together, we’ll then expand our focus. Here we ignore the majority of our indicator variables and focus on the idea that there are two latent constructs: (1) Social Self-efficacy and (2) Life Satisfaction.
We’re aiming to articulate a mathematical structure where our indicator variables \(x_{ij}\) are determined by a latent factor \(\text{Ksi}_{j}\) through an estimated factor loading \(\lambda_{ij}\). Functionally we have a set of equations with error terms \(\psi_i\) for each individual.
or more compactly
The goal is to articulate the relationship between the different factors in terms of the covariances between these latent terms and estimate the relationships each latent factor has with the manifest indicator variables. At a high level, we’re saying the joint distribution of the observed data can be represented through conditionalisation in the following schema.
We’re making an argument that the multivariate observations \(\mathbf{x}\) from each individual \(q\) can be considered conditionally exchangeable and in this way represented through Bayesian conditionalisation via De Finetti’s theorem. This is the Bayesian approach to the estimation of CFA and SEM models. We’re seeking a conditionalisation structure that can retrodict the observed data based on latent constructs and hypothetical relationships among the constructs and the observed data points. We will show how to build these structures into our model below
# Set up coordinates for appropriate indexing of latent factors
coords = {
"obs": list(range(len(df))),
"indicators": ["se_social_p1", "se_social_p2", "se_social_p3", "ls_p1", "ls_p2", "ls_p3"],
"indicators_1": ["se_social_p1", "se_social_p2", "se_social_p3"],
"indicators_2": ["ls_p1", "ls_p2", "ls_p3"],
"latent": ["SE_SOC", "LS"],
}
obs_idx = list(range(len(df)))
with pm.Model(coords=coords) as model:
# Set up Factor Loadings
lambdas_ = pm.Normal("lambdas_1", 1, 10, dims=("indicators_1"))
# Force a fixed scale on the factor loadings for factor 1
lambdas_1 = pm.Deterministic(
"lambdas1", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_1")
)
lambdas_ = pm.Normal("lambdas_2", 1, 10, dims=("indicators_2"))
# Force a fixed scale on the factor loadings for factor 2
lambdas_2 = pm.Deterministic(
"lambdas2", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_2")
)
# Specify covariance structure between latent factors
kappa = 0
sd_dist = pm.Exponential.dist(1.0, shape=2)
chol, _, _ = pm.LKJCholeskyCov("chol_cov", n=2, eta=2, sd_dist=sd_dist, compute_corr=True)
ksi = pm.MvNormal("ksi", kappa, chol=chol, dims=("obs", "latent"))
# Construct Pseudo Observation matrix based on Factor Loadings
tau = pm.Normal("tau", 3, 10, dims="indicators")
m1 = tau[0] + ksi[obs_idx, 0] * lambdas_1[0]
m2 = tau[1] + ksi[obs_idx, 0] * lambdas_1[1]
m3 = tau[2] + ksi[obs_idx, 0] * lambdas_1[2]
m4 = tau[3] + ksi[obs_idx, 1] * lambdas_2[0]
m5 = tau[4] + ksi[obs_idx, 1] * lambdas_2[1]
m6 = tau[5] + ksi[obs_idx, 1] * lambdas_2[2]
mu = pm.Deterministic("mu", pm.math.stack([m1, m2, m3, m4, m5, m6]).T)
## Error Terms
Psi = pm.InverseGamma("Psi", 5, 10, dims="indicators")
# Likelihood
_ = pm.Normal(
"likelihood",
mu,
Psi,
observed=df[
["se_social_p1", "se_social_p2", "se_social_p3", "ls_p1", "ls_p2", "ls_p3"]
].values,
)
idata = pm.sample(
nuts_sampler="numpyro", target_accept=0.95, idata_kwargs={"log_likelihood": True}
)
idata.extend(pm.sample_posterior_predictive(idata))
pm.model_to_graphviz(model)
Compiling...
Compilation time = 0:00:02.037172
Sampling...
Sampling time = 0:00:04.970539
Transforming variables...
Transformation time = 0:00:00.366997
Computing Log Likelihood...
Log Likelihood time = 0:00:00.248048
Sampling: [likelihood]
Here the model structure and dependency graph becomes a little clearer. Our likelihood term models a outcome matrix of 283x6 observations i.e. the survey responses for 6 questions. These survey responses are modelled as regression-like outcomes from a multivariate normal \(Ksi\) with a prior correlation structure between the latent constructs. We then specify how each of the outcome measures is a function of one of the latent factor modified by the appropriate factor loading lambda
.
Meausurement Model Structure#
We can now see how the covariance structure among the latent constructs is integral piece of the overarching model design which is fed forward into our pseudo regression components and weighted with the respective lambda terms.
az.summary(idata, var_names=["lambdas1", "lambdas2"])
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
lambdas1[se_social_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 4000.0 | 4000.0 | NaN |
lambdas1[se_social_p2] | 0.977 | 0.060 | 0.863 | 1.089 | 0.002 | 0.001 | 993.0 | 1688.0 | 1.0 |
lambdas1[se_social_p3] | 0.947 | 0.074 | 0.810 | 1.091 | 0.002 | 0.002 | 1110.0 | 1965.0 | 1.0 |
lambdas2[ls_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 4000.0 | 4000.0 | NaN |
lambdas2[ls_p2] | 0.815 | 0.087 | 0.672 | 0.989 | 0.004 | 0.003 | 524.0 | 792.0 | 1.0 |
lambdas2[ls_p3] | 0.861 | 0.095 | 0.689 | 1.047 | 0.004 | 0.003 | 713.0 | 1164.0 | 1.0 |
These factor loadings are generally important to interpret in terms of construct validity. Because we’ve specified one of the indicator variables to be fixed at 1, the other indicators which load on that factor should have a loading coefficient in broadly the same scale as the fixed point indicator that defines the construct scale. We’re looking for consistency among the loadings to assess whether the indicators are reliable measures of the construct i.e. if the indicator loadings deviates too far from unit 1 then there is an argument to be made that these indicators don’t belong in the same factor construct.
idata
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, indicators_1: 3, indicators_2: 3, obs: 283, latent: 2, indicators: 6, chol_cov_dim_0: 3, chol_cov_corr_dim_0: 2, chol_cov_corr_dim_1: 2, chol_cov_stds_dim_0: 2, mu_dim_0: 283, mu_dim_1: 6) Coordinates: (12/13) * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999 * indicators_1 (indicators_1) <U12 'se_social_p1' ... 'se_social_p3' * indicators_2 (indicators_2) <U5 'ls_p1' 'ls_p2' 'ls_p3' * obs (obs) int64 0 1 2 3 4 5 6 ... 277 278 279 280 281 282 * latent (latent) <U6 'SE_SOC' 'LS' ... ... * chol_cov_dim_0 (chol_cov_dim_0) int64 0 1 2 * chol_cov_corr_dim_0 (chol_cov_corr_dim_0) int64 0 1 * chol_cov_corr_dim_1 (chol_cov_corr_dim_1) int64 0 1 * chol_cov_stds_dim_0 (chol_cov_stds_dim_0) int64 0 1 * mu_dim_0 (mu_dim_0) int64 0 1 2 3 4 5 ... 278 279 280 281 282 * mu_dim_1 (mu_dim_1) int64 0 1 2 3 4 5 Data variables: lambdas_1 (chain, draw, indicators_1) float64 -1.601 ... 0.962 lambdas_2 (chain, draw, indicators_2) float64 11.47 ... 0.7527 ksi (chain, draw, obs, latent) float64 0.4271 ... 0.9507 tau (chain, draw, indicators) float64 5.301 5.437 ... 5.233 chol_cov (chain, draw, chol_cov_dim_0) float64 0.6359 ... 0.5823 Psi (chain, draw, indicators) float64 0.4654 ... 0.6677 lambdas1 (chain, draw, indicators_1) float64 1.0 ... 0.962 lambdas2 (chain, draw, indicators_2) float64 1.0 ... 0.7527 chol_cov_corr (chain, draw, chol_cov_corr_dim_0, chol_cov_corr_dim_1) float64 ... chol_cov_stds (chain, draw, chol_cov_stds_dim_0) float64 0.6359 ..... mu (chain, draw, mu_dim_0, mu_dim_1) float64 5.728 ... ... Attributes: created_at: 2024-09-25T11:16:42.786789 arviz_version: 0.17.0
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, likelihood_dim_2: 283, likelihood_dim_3: 6) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999 * likelihood_dim_2 (likelihood_dim_2) int64 0 1 2 3 4 ... 278 279 280 281 282 * likelihood_dim_3 (likelihood_dim_3) int64 0 1 2 3 4 5 Data variables: likelihood (chain, draw, likelihood_dim_2, likelihood_dim_3) float64 ... Attributes: created_at: 2024-09-25T11:16:43.032825 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, likelihood_dim_0: 283, likelihood_dim_1: 6) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999 * likelihood_dim_0 (likelihood_dim_0) int64 0 1 2 3 4 ... 278 279 280 281 282 * likelihood_dim_1 (likelihood_dim_1) int64 0 1 2 3 4 5 Data variables: likelihood (chain, draw, likelihood_dim_0, likelihood_dim_1) float64 ... Attributes: created_at: 2024-09-25T11:16:42.790979 arviz_version: 0.17.0
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999 Data variables: acceptance_rate (chain, draw) float64 0.9026 0.9604 0.9726 ... 0.992 0.9294 step_size (chain, draw) float64 0.1464 0.1464 ... 0.1427 0.1427 diverging (chain, draw) bool False False False ... False False False energy (chain, draw) float64 2.091e+03 2.111e+03 ... 2.072e+03 n_steps (chain, draw) int64 31 31 31 31 31 31 ... 31 31 31 31 31 31 tree_depth (chain, draw) int64 5 5 5 5 5 5 5 5 5 ... 5 5 5 5 5 5 5 5 5 lp (chain, draw) float64 1.803e+03 1.804e+03 ... 1.78e+03 Attributes: created_at: 2024-09-25T11:16:42.789976 arviz_version: 0.17.0
-
<xarray.Dataset> Dimensions: (likelihood_dim_0: 283, likelihood_dim_1: 6) Coordinates: * likelihood_dim_0 (likelihood_dim_0) int64 0 1 2 3 4 ... 278 279 280 281 282 * likelihood_dim_1 (likelihood_dim_1) int64 0 1 2 3 4 5 Data variables: likelihood (likelihood_dim_0, likelihood_dim_1) float64 5.8 ... 5.75 Attributes: created_at: 2024-09-25T11:16:42.791317 arviz_version: 0.17.0 inference_library: numpyro inference_library_version: 0.13.2 sampling_time: 4.970539
Let’s plot the trace diagnostics to validate the sampler has converged well to the posterior distribution.
az.plot_trace(idata, var_names=["lambdas1", "lambdas2", "tau", "Psi", "ksi"]);
Sampling the Latent Constructs#
One thing to highlight in particular about the Bayesian manner of fitting CFA and SEM models is that we now have access to the posterior distribution of the latent quantities. These samples can offer insight into particular individuals in our survey that is harder to glean from the multivariate presentation of the manifest variables.
Show code cell source
fig, axs = plt.subplots(1, 2, figsize=(20, 9))
axs = axs.flatten()
ax1 = axs[0]
ax2 = axs[1]
az.plot_forest(
idata,
var_names=["ksi"],
combined=True,
ax=ax1,
colors="forestgreen",
coords={"latent": ["SE_SOC"]},
)
az.plot_forest(
idata, var_names=["ksi"], combined=True, ax=ax2, colors="slateblue", coords={"latent": ["LS"]}
)
ax1.set_yticklabels([])
ax1.set_xlabel("SE_SOCIAL")
ax2.set_yticklabels([])
ax2.set_xlabel("LS")
ax1.axvline(-2, color="red")
ax2.axvline(-2, color="red")
ax1.set_title("Individual Social Self Efficacy \n On Latent Factor SE_SOCIAL")
ax2.set_title("Individual Life Satisfaction Metric \n On Latent Factor LS")
plt.show();
In this way we can identify and zero-in on individuals that appear to be outliers on one or more of the latent constructs.
Posterior Predictive Checks#
As in more traditional Bayesian modelling, a core component of model evaluation is the assessment of the posterior predictive distribution i.e. the implied outcome distribution. Here too we can pull out draws against each of the indicator variables to assess for coherence and adequacy.
def make_ppc(
idata,
samples=100,
drivers=["se_social_p1", "se_social_p2", "se_social_p3", "ls_p1", "ls_p2", "ls_p3"],
dims=(2, 3),
):
fig, axs = plt.subplots(dims[0], dims[1], figsize=(20, 10))
axs = axs.flatten()
for i in range(len(drivers)):
for j in range(samples):
temp = az.extract(idata["posterior_predictive"].sel({"likelihood_dim_3": i}))[
"likelihood"
].values[:, j]
temp = pd.DataFrame(temp, columns=["likelihood"])
if j == 0:
axs[i].hist(df[drivers[i]], alpha=0.3, ec="black", bins=20, label="Observed Scores")
axs[i].hist(
temp["likelihood"], color="purple", alpha=0.1, bins=20, label="Predicted Scores"
)
else:
axs[i].hist(df[drivers[i]], alpha=0.3, ec="black", bins=20)
axs[i].hist(temp["likelihood"], color="purple", alpha=0.1, bins=20)
axs[i].set_title(f"Posterior Predictive Checks {drivers[i]}")
axs[i].legend()
plt.tight_layout()
plt.show()
make_ppc(idata)
del idata
Which shows a relatively sound recovery of the observed data.
Intermediate Cross-Loading Model#
The idea of a measurment model is maybe a little opaque when we only see models that fit well. Instead we want to briefly show how an in-apt measurement model gets reflected in the estimated parameters for the factor loadings. Here we specify a measurement model which attempts to couple the se_social
and sup_parents
indicators and bundle them into the same factor.
coords = {
"obs": list(range(len(df))),
"indicators": [
"se_social_p1",
"se_social_p2",
"se_social_p3",
"sup_parents_p1",
"sup_parents_p2",
"sup_parents_p3",
"ls_p1",
"ls_p2",
"ls_p3",
],
## Attempt Cross-Loading of two metric types on one factor
"indicators_1": [
"se_social_p1",
"se_social_p2",
"se_social_p3",
"sup_parents_p1",
"sup_parents_p2",
"sup_parents_p3",
],
"indicators_2": ["ls_p1", "ls_p2", "ls_p3"],
"latent": ["SE_SOC", "LS"],
}
obs_idx = list(range(len(df)))
with pm.Model(coords=coords) as model:
Psi = pm.InverseGamma("Psi", 5, 10, dims="indicators")
lambdas_ = pm.Normal("lambdas_1", 1, 10, dims=("indicators_1"))
# Force a fixed scale on the factor loadings for factor 1
lambdas_1 = pm.Deterministic(
"lambdas1", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_1")
)
lambdas_ = pm.Normal("lambdas_2", 1, 10, dims=("indicators_2"))
# Force a fixed scale on the factor loadings for factor 2
lambdas_2 = pm.Deterministic(
"lambdas2", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_2")
)
tau = pm.Normal("tau", 3, 10, dims="indicators")
# Specify covariance structure between latent factors
kappa = 0
sd_dist = pm.Exponential.dist(1.0, shape=2)
chol, _, _ = pm.LKJCholeskyCov("chol_cov", n=2, eta=2, sd_dist=sd_dist, compute_corr=True)
ksi = pm.MvNormal("ksi", kappa, chol=chol, dims=("obs", "latent"))
# Construct Observation matrix
m1 = tau[0] + ksi[obs_idx, 0] * lambdas_1[0]
m2 = tau[1] + ksi[obs_idx, 0] * lambdas_1[1]
m3 = tau[2] + ksi[obs_idx, 0] * lambdas_1[2]
m4 = tau[3] + ksi[obs_idx, 0] * lambdas_1[3]
m5 = tau[4] + ksi[obs_idx, 0] * lambdas_1[4]
m6 = tau[5] + ksi[obs_idx, 0] * lambdas_1[5]
m7 = tau[3] + ksi[obs_idx, 1] * lambdas_2[0]
m8 = tau[4] + ksi[obs_idx, 1] * lambdas_2[1]
m9 = tau[5] + ksi[obs_idx, 1] * lambdas_2[2]
mu = pm.Deterministic("mu", pm.math.stack([m1, m2, m3, m4, m5, m6, m7, m8, m9]).T)
_ = pm.Normal(
"likelihood",
mu,
Psi,
observed=df[
[
"se_social_p1",
"se_social_p2",
"se_social_p3",
"sup_parents_p1",
"sup_parents_p2",
"sup_parents_p3",
"ls_p1",
"ls_p2",
"ls_p3",
]
].values,
)
idata = pm.sample(
# draws = 4000,
draws=10000,
nuts_sampler="numpyro",
target_accept=0.99,
idata_kwargs={"log_likelihood": True},
random_seed=114,
)
idata.extend(pm.sample_posterior_predictive(idata))
Compiling...
Compilation time = 0:00:01.765569
Sampling...
Sampling time = 0:00:26.201814
Transforming variables...
Transformation time = 0:00:01.184088
Computing Log Likelihood...
Log Likelihood time = 0:00:00.831334
Sampling: [likelihood]
Again our model samples well but the parameter estimates suggest that there is some inconsistency between the scale on which we’re trying to force both sets of metrics.
az.summary(idata, var_names=["lambdas1", "lambdas2"])
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
lambdas1[se_social_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas1[se_social_p2] | 0.928 | 0.128 | 0.694 | 1.172 | 0.002 | 0.002 | 3090.0 | 5423.0 | 1.0 |
lambdas1[se_social_p3] | 0.854 | 0.139 | 0.598 | 1.121 | 0.002 | 0.002 | 4600.0 | 8366.0 | 1.0 |
lambdas1[sup_parents_p1] | 2.321 | 0.289 | 1.807 | 2.867 | 0.008 | 0.005 | 1421.0 | 2736.0 | 1.0 |
lambdas1[sup_parents_p2] | 2.171 | 0.278 | 1.684 | 2.699 | 0.008 | 0.005 | 1333.0 | 2592.0 | 1.0 |
lambdas1[sup_parents_p3] | 2.334 | 0.290 | 1.832 | 2.898 | 0.008 | 0.005 | 1442.0 | 2795.0 | 1.0 |
lambdas2[ls_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas2[ls_p2] | 0.777 | 0.105 | 0.589 | 0.975 | 0.002 | 0.002 | 2530.0 | 4296.0 | 1.0 |
lambdas2[ls_p3] | 1.080 | 0.135 | 0.840 | 1.335 | 0.003 | 0.002 | 2271.0 | 3902.0 | 1.0 |
This is similarly reflected in the diagnostic energy plots here too.
fig, axs = plt.subplots(1, 2, figsize=(20, 9))
axs = axs.flatten()
az.plot_energy(idata, ax=axs[0])
az.plot_forest(idata, var_names=["lambdas1"], combined=True, ax=axs[1]);
This hints at a variety of measurement model misspecification and should force us back to the drawing board. An appropriate measurement model maps the indicator variables to a consistently defined latent construct that plausibly reflects aspects of the realised indicator metrics.
Full Measurement Model#
With this in mind we’ll now specify a full measurement that maps each of our thematically similar indicator metrics to an individual latent construct. This mandates the postulation of 5 distinct constructs where we only admit three metrics load on each construct. The choice of which metric loads on the latent construct is determined in our case by the constructs each measure is intended to measure. In the typical lavaan
syntax we might write the model as follows:
SUP_Parents =~ sup_parents_p1 + sup_parents_p2 + sup_parents_p3
SUP_Friends =~ sup_friends_p1 + sup_friends_p2 + sup_friends_p3
SE_Academic =~ se_acad_p1 + se_acad_p2 + se_acad_p3
SE_Social =~ se_social_p1 + se_social_p2 + se_social_p3
LS =~ ls_p1 + ls_p2 + ls_p3
where the =~
syntax denotes “is measured by” relation. It is the manner in which each of these indicator variables is determined by the latent construct that we seek to estimate when fitting a measurement model of this type.
drivers = [
"se_acad_p1",
"se_acad_p2",
"se_acad_p3",
"se_social_p1",
"se_social_p2",
"se_social_p3",
"sup_friends_p1",
"sup_friends_p2",
"sup_friends_p3",
"sup_parents_p1",
"sup_parents_p2",
"sup_parents_p3",
"ls_p1",
"ls_p2",
"ls_p3",
]
coords = {
"obs": list(range(len(df))),
"indicators": drivers,
"indicators_1": ["se_acad_p1", "se_acad_p2", "se_acad_p3"],
"indicators_2": ["se_social_p1", "se_social_p2", "se_social_p3"],
"indicators_3": ["sup_friends_p1", "sup_friends_p2", "sup_friends_p3"],
"indicators_4": ["sup_parents_p1", "sup_parents_p2", "sup_parents_p3"],
"indicators_5": ["ls_p1", "ls_p2", "ls_p3"],
"latent": ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P", "LS"],
"latent1": ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P", "LS"],
}
obs_idx = list(range(len(df)))
with pm.Model(coords=coords) as model:
Psi = pm.InverseGamma("Psi", 5, 10, dims="indicators")
lambdas_ = pm.Normal("lambdas_1", 1, 10, dims=("indicators_1"))
lambdas_1 = pm.Deterministic(
"lambdas1", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_1")
)
lambdas_ = pm.Normal("lambdas_2", 1, 10, dims=("indicators_2"))
lambdas_2 = pm.Deterministic(
"lambdas2", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_2")
)
lambdas_ = pm.Normal("lambdas_3", 1, 10, dims=("indicators_3"))
lambdas_3 = pm.Deterministic(
"lambdas3", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_3")
)
lambdas_ = pm.Normal("lambdas_4", 1, 10, dims=("indicators_4"))
lambdas_4 = pm.Deterministic(
"lambdas4", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_4")
)
lambdas_ = pm.Normal("lambdas_5", 1, 10, dims=("indicators_5"))
lambdas_5 = pm.Deterministic(
"lambdas5", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_5")
)
tau = pm.Normal("tau", 3, 10, dims="indicators")
kappa = 0
sd_dist = pm.Exponential.dist(1.0, shape=5)
chol, _, _ = pm.LKJCholeskyCov("chol_cov", n=5, eta=2, sd_dist=sd_dist, compute_corr=True)
cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("latent", "latent1"))
ksi = pm.MvNormal("ksi", kappa, chol=chol, dims=("obs", "latent"))
m0 = tau[0] + ksi[obs_idx, 0] * lambdas_1[0]
m1 = tau[1] + ksi[obs_idx, 0] * lambdas_1[1]
m2 = tau[2] + ksi[obs_idx, 0] * lambdas_1[2]
m3 = tau[3] + ksi[obs_idx, 1] * lambdas_2[0]
m4 = tau[4] + ksi[obs_idx, 1] * lambdas_2[1]
m5 = tau[5] + ksi[obs_idx, 1] * lambdas_2[2]
m6 = tau[6] + ksi[obs_idx, 2] * lambdas_3[0]
m7 = tau[7] + ksi[obs_idx, 2] * lambdas_3[1]
m8 = tau[8] + ksi[obs_idx, 2] * lambdas_3[2]
m9 = tau[9] + ksi[obs_idx, 3] * lambdas_4[0]
m10 = tau[10] + ksi[obs_idx, 3] * lambdas_4[1]
m11 = tau[11] + ksi[obs_idx, 3] * lambdas_4[2]
m12 = tau[12] + ksi[obs_idx, 4] * lambdas_5[0]
m13 = tau[13] + ksi[obs_idx, 4] * lambdas_5[1]
m14 = tau[14] + ksi[obs_idx, 4] * lambdas_5[2]
mu = pm.Deterministic(
"mu", pm.math.stack([m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14]).T
)
_ = pm.Normal("likelihood", mu, Psi, observed=df[drivers].values)
idata_mm = pm.sample(
draws=10000,
nuts_sampler="numpyro",
target_accept=0.98,
tune=1000,
idata_kwargs={"log_likelihood": True},
random_seed=100,
)
idata_mm.extend(pm.sample_posterior_predictive(idata_mm))
Compiling...
Compilation time = 0:00:02.385203
Sampling...
Sampling time = 0:04:36.313368
Transforming variables...
Transformation time = 0:00:03.402148
Computing Log Likelihood...
Log Likelihood time = 0:00:01.964164
Sampling: [likelihood]
Model Evaluation Checks#
We can see quickly here how this factor structure seems to sample better and retains a consistency of scale.
fig, axs = plt.subplots(1, 2, figsize=(20, 9))
axs = axs.flatten()
az.plot_energy(idata_mm, ax=axs[0])
az.plot_forest(idata_mm, var_names=["lambdas1", "lambdas2", "lambdas3"], combined=True, ax=axs[1]);
We can also pull out the more typical patterns of model evaluation by assessing the fit between the posterior predicted covariances and the sample covariances. This is a sanity check to assess local model fit statistics. The below code iterates over draws from the posterior predictive distribution and calculates the covariance or correlation matrix on each draw, we calculate the residuals on each draw between each of the covariances and then average across the draws.
def get_posterior_resids(idata, samples=100, metric="cov"):
resids = []
for i in range(100):
if metric == "cov":
model_cov = pd.DataFrame(
az.extract(idata["posterior_predictive"])["likelihood"][:, :, i]
).cov()
obs_cov = df[drivers].cov()
else:
model_cov = pd.DataFrame(
az.extract(idata["posterior_predictive"])["likelihood"][:, :, i]
).corr()
obs_cov = df[drivers].corr()
model_cov.index = obs_cov.index
model_cov.columns = obs_cov.columns
residuals = model_cov - obs_cov
resids.append(residuals.values.flatten())
residuals_posterior = pd.DataFrame(pd.DataFrame(resids).mean().values.reshape(15, 15))
residuals_posterior.index = obs_cov.index
residuals_posterior.columns = obs_cov.index
return residuals_posterior
residuals_posterior_cov = get_posterior_resids(idata_mm, 2500)
residuals_posterior_corr = get_posterior_resids(idata_mm, 2500, metric="corr")
These tables lend themselves to nice plots where we can highlight the deviation from the sample covariance and correlation statistics.
fig, ax = plt.subplots(figsize=(20, 10))
mask = np.triu(np.ones_like(residuals_posterior_corr, dtype=bool))
ax = sns.heatmap(residuals_posterior_corr, annot=True, cmap="bwr", mask=mask)
ax.set_title("Residuals between Model Implied and Sample Correlations", fontsize=25);
fig, ax = plt.subplots(figsize=(20, 10))
ax = sns.heatmap(residuals_posterior_cov, annot=True, cmap="bwr", mask=mask)
ax.set_title("Residuals between Model Implied and Sample Covariances", fontsize=25);
However the focus on recovering a fit to such summary statistics is less compelling and more indirect than recovering the observed data itself. We can also do more contemporary Bayesian posterior predictive checks as we pull out the predictive posterior distribution for each of the observed metrics.
make_ppc(idata_mm, 100, drivers=residuals_posterior_cov.columns, dims=(5, 3));
Model Analysis#
We’re not just interested in recovering the observed data patterns we also want a way of pulling out the inferences relating to the latent constructs. For instance we can pull out the factor loadings and calculate measures of variance accounted for by each of the indicator variables in this factor system and for the factors themselves.
def make_factor_loadings_df(idata):
factor_loadings = pd.DataFrame(
az.summary(
idata_mm, var_names=["lambdas1", "lambdas2", "lambdas3", "lambdas4", "lambdas5"]
)["mean"]
).reset_index()
factor_loadings["factor"] = factor_loadings["index"].str.split("[", expand=True)[0]
factor_loadings.columns = ["factor_loading", "factor_loading_weight", "factor"]
factor_loadings["factor_loading_weight_sq"] = factor_loadings["factor_loading_weight"] ** 2
factor_loadings["sum_sq_loadings"] = factor_loadings.groupby("factor")[
"factor_loading_weight_sq"
].transform(sum)
factor_loadings["error_variances"] = az.summary(idata_mm, var_names=["Psi"])["mean"].values
factor_loadings["total_indicator_variance"] = (
factor_loadings["factor_loading_weight_sq"] + factor_loadings["error_variances"]
)
factor_loadings["total_variance"] = factor_loadings["total_indicator_variance"].sum()
factor_loadings["indicator_explained_variance"] = (
factor_loadings["factor_loading_weight_sq"] / factor_loadings["total_variance"]
)
factor_loadings["factor_explained_variance"] = (
factor_loadings["sum_sq_loadings"] / factor_loadings["total_variance"]
)
num_cols = [c for c in factor_loadings.columns if not c in ["factor_loading", "factor"]]
return factor_loadings
pd.set_option("display.max_colwidth", 15)
factor_loadings = make_factor_loadings_df(idata_mm)
num_cols = [c for c in factor_loadings.columns if not c in ["factor_loading", "factor"]]
factor_loadings.style.format("{:.2f}", subset=num_cols).background_gradient(
axis=0, subset=["indicator_explained_variance", "factor_explained_variance"]
)
/var/folders/__/ng_3_9pn1f11ftyml_qr69vh0000gn/T/ipykernel_43772/1650813745.py:12: FutureWarning: The provided callable <built-in function sum> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
].transform(sum)
factor_loading | factor_loading_weight | factor | factor_loading_weight_sq | sum_sq_loadings | error_variances | total_indicator_variance | total_variance | indicator_explained_variance | factor_explained_variance | |
---|---|---|---|---|---|---|---|---|---|---|
0 | lambdas1[se_acad_p1] | 1.00 | lambdas1 | 1.00 | 2.61 | 0.41 | 1.41 | 21.47 | 0.05 | 0.12 |
1 | lambdas1[se_acad_p2] | 0.82 | lambdas1 | 0.67 | 2.61 | 0.41 | 1.09 | 21.47 | 0.03 | 0.12 |
2 | lambdas1[se_acad_p3] | 0.97 | lambdas1 | 0.94 | 2.61 | 0.47 | 1.41 | 21.47 | 0.04 | 0.12 |
3 | lambdas2[se_social_p1] | 1.00 | lambdas2 | 1.00 | 2.81 | 0.43 | 1.43 | 21.47 | 0.05 | 0.13 |
4 | lambdas2[se_social_p2] | 0.96 | lambdas2 | 0.92 | 2.81 | 0.36 | 1.29 | 21.47 | 0.04 | 0.13 |
5 | lambdas2[se_social_p3] | 0.94 | lambdas2 | 0.88 | 2.81 | 0.55 | 1.43 | 21.47 | 0.04 | 0.13 |
6 | lambdas3[sup_friends_p1] | 1.00 | lambdas3 | 1.00 | 2.46 | 0.52 | 1.52 | 21.47 | 0.05 | 0.11 |
7 | lambdas3[sup_friends_p2] | 0.80 | lambdas3 | 0.64 | 2.46 | 0.51 | 1.15 | 21.47 | 0.03 | 0.11 |
8 | lambdas3[sup_friends_p3] | 0.91 | lambdas3 | 0.82 | 2.46 | 0.62 | 1.44 | 21.47 | 0.04 | 0.11 |
9 | lambdas4[sup_parents_p1] | 1.00 | lambdas4 | 1.00 | 3.11 | 0.55 | 1.55 | 21.47 | 0.05 | 0.14 |
10 | lambdas4[sup_parents_p2] | 1.04 | lambdas4 | 1.08 | 3.11 | 0.54 | 1.62 | 21.47 | 0.05 | 0.14 |
11 | lambdas4[sup_parents_p3] | 1.01 | lambdas4 | 1.02 | 3.11 | 0.68 | 1.70 | 21.47 | 0.05 | 0.14 |
12 | lambdas5[ls_p1] | 1.00 | lambdas5 | 1.00 | 2.61 | 0.67 | 1.67 | 21.47 | 0.05 | 0.12 |
13 | lambdas5[ls_p2] | 0.79 | lambdas5 | 0.63 | 2.61 | 0.53 | 1.16 | 21.47 | 0.03 | 0.12 |
14 | lambdas5[ls_p3] | 0.99 | lambdas5 | 0.98 | 2.61 | 0.62 | 1.61 | 21.47 | 0.05 | 0.12 |
We can pull out and plot the ordered weightings as a kind of feature importance plot.
fig, ax = plt.subplots(figsize=(15, 8))
temp = factor_loadings[["factor_loading", "indicator_explained_variance"]].sort_values(
by="indicator_explained_variance"
)
ax.barh(
temp["factor_loading"], temp["indicator_explained_variance"], align="center", color="slateblue"
)
ax.set_title("Explained Variance");
The goal of this kind of view isn’t necessarily to find useful features as in the machine learning context, but to help understand the nature of the variation in our system. We can also pull out covariances and correlations among the latent factors
Show code cell source
cov_df = pd.DataFrame(az.extract(idata_mm["posterior"])["cov"].mean(axis=2))
cov_df.index = ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P", "LS"]
cov_df.columns = ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P", "LS"]
correlation_df = pd.DataFrame(az.extract(idata_mm["posterior"])["chol_cov_corr"].mean(axis=2))
correlation_df.index = ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P", "LS"]
correlation_df.columns = ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P", "LS"]
fig, axs = plt.subplots(1, 2, figsize=(20, 10))
axs = axs.flatten()
mask = np.triu(np.ones_like(cov_df, dtype=bool))
sns.heatmap(cov_df, annot=True, cmap="Blues", ax=axs[0], mask=mask)
axs[0].set_title("Covariance of Latent Constructs")
axs[1].set_title("Correlation of Latent Constructs")
sns.heatmap(correlation_df, annot=True, cmap="Blues", ax=axs[1], mask=mask);
Which highlights the strong relationships between life-satisfaction LS
construct, parental support SUP_P
and social self-efficacy SE_SOCIAL
. We can observe these patterns in the draws of our latent constructs too
Show code cell source
fig, axs = plt.subplots(1, 3, figsize=(15, 10))
axs = axs.flatten()
ax = axs[0]
ax1 = axs[1]
ax2 = axs[2]
az.plot_forest(idata_mm, var_names=["ksi"], combined=True, ax=ax, coords={"latent": ["SUP_P"]})
az.plot_forest(
idata_mm,
var_names=["ksi"],
combined=True,
ax=ax1,
colors="forestgreen",
coords={"latent": ["SE_SOCIAL"]},
)
az.plot_forest(
idata_mm,
var_names=["ksi"],
combined=True,
ax=ax2,
colors="slateblue",
coords={"latent": ["LS"]},
)
ax.set_yticklabels([])
ax.set_xlabel("SUP_P")
ax1.set_yticklabels([])
ax1.set_xlabel("SE_SOCIAL")
ax2.set_yticklabels([])
ax2.set_xlabel("LS")
ax.axvline(-2, color="red")
ax1.axvline(-2, color="red")
ax2.axvline(-2, color="red")
ax.set_title("Individual Parental Support Metric \n On Latent Factor SUP_P")
ax1.set_title("Individual Social Self Efficacy \n On Latent Factor SE_SOCIAL")
ax2.set_title("Individual Life Satisfaction Metric \n On Latent Factor LS")
plt.show();
It’s worth highlighting here the cohort on the top left of the SUP_P
graph which have low parental support scores, seem to have less severe SE_SOCIAL
scores. This combination seems to result in fairly standard LS
scores suggesting some kind of moderated relationship.
Bayesian Structural Equation Models#
We’ve now seen how measurement models help us understand the relationships between disparate indicator variables in a kind of crude way. We have postulated a system of latent factors and derived the correlations between these factors to help us understand the strength of relationships between the broader constructs of interest. This is kind a special case of a structural equation models. In the SEM tradition we’re interested in figuring out aspects of the structural relations between variables that means want to posit dependence and independence relationship to interrogate our beliefs about influence flows through the system.
For our data set we can postulate the following chain of dependencies
This picture introduces specific claims of dependence and the question then becomes how to model these patterns? In the next section we’ll build on the structures of the basic measurement model to articulate these chain of dependence as functional equations of the “root” constructs. This allows to evaluate the same questions of model adequacy as before, but additionally we can now phrase questions about direct and indirect relationships between the latent constructs. In particular, since our focus is on what drives life-satisfaction, we can ask our model about the mediated effects of parental and peer support.
Model Complexity and Bayesian Sensitivity Analysis#
These models are already complicated and now we’re adding a bunch of new parameters and structure to the model. Each of the parameters is equipped with a prior that shapes the implications of the model specification. This is a hugely expressive framework where we can encode a large variety of dependencies and correlations. With this freedom to structure our inferential model we need to be careful to assess the robustness of our inferences. As such we will here perform a quick sensitivity analysis to show how the central implications of this model vary under differing prior settings.
drivers = [
"se_acad_p1",
"se_acad_p2",
"se_acad_p3",
"se_social_p1",
"se_social_p2",
"se_social_p3",
"sup_friends_p1",
"sup_friends_p2",
"sup_friends_p3",
"sup_parents_p1",
"sup_parents_p2",
"sup_parents_p3",
"ls_p1",
"ls_p2",
"ls_p3",
]
def make_indirect_sem(priors):
coords = {
"obs": list(range(len(df))),
"indicators": drivers,
"indicators_1": ["se_acad_p1", "se_acad_p2", "se_acad_p3"],
"indicators_2": ["se_social_p1", "se_social_p2", "se_social_p3"],
"indicators_3": ["sup_friends_p1", "sup_friends_p2", "sup_friends_p3"],
"indicators_4": ["sup_parents_p1", "sup_parents_p2", "sup_parents_p3"],
"indicators_5": ["ls_p1", "ls_p2", "ls_p3"],
"latent": ["SUP_F", "SUP_P"],
"latent1": ["SUP_F", "SUP_P"],
"latent_regression": ["SUP_F->SE_ACAD", "SUP_P->SE_ACAD", "SUP_F->SE_SOC", "SUP_P->SE_SOC"],
"regression": ["SE_ACAD", "SE_SOCIAL", "SUP_F", "SUP_P"],
}
obs_idx = list(range(len(df)))
with pm.Model(coords=coords) as model:
Psi = pm.InverseGamma("Psi", 5, 10, dims="indicators")
lambdas_ = pm.Normal(
"lambdas_1", priors["lambda"][0], priors["lambda"][1], dims=("indicators_1")
)
lambdas_1 = pm.Deterministic(
"lambdas1", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_1")
)
lambdas_ = pm.Normal(
"lambdas_2", priors["lambda"][0], priors["lambda"][1], dims=("indicators_2")
)
lambdas_2 = pm.Deterministic(
"lambdas2", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_2")
)
lambdas_ = pm.Normal(
"lambdas_3", priors["lambda"][0], priors["lambda"][1], dims=("indicators_3")
)
lambdas_3 = pm.Deterministic(
"lambdas3", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_3")
)
lambdas_ = pm.Normal(
"lambdas_4", priors["lambda"][0], priors["lambda"][1], dims=("indicators_4")
)
lambdas_4 = pm.Deterministic(
"lambdas4", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_4")
)
lambdas_ = pm.Normal(
"lambdas_5", priors["lambda"][0], priors["lambda"][1], dims=("indicators_5")
)
lambdas_5 = pm.Deterministic(
"lambdas5", pt.set_subtensor(lambdas_[0], 1), dims=("indicators_5")
)
tau = pm.Normal("tau", 3, 0.5, dims="indicators")
kappa = 0
sd_dist = pm.Exponential.dist(1.0, shape=2)
chol, _, _ = pm.LKJCholeskyCov(
"chol_cov", n=2, eta=priors["eta"], sd_dist=sd_dist, compute_corr=True
)
cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("latent", "latent1"))
ksi = pm.MvNormal("ksi", kappa, chol=chol, dims=("obs", "latent"))
# Regression Components
beta_r = pm.Normal("beta_r", 0, priors["beta_r"], dims="latent_regression")
beta_r2 = pm.Normal("beta_r2", 0, priors["beta_r2"], dims="regression")
sd_dist1 = pm.Exponential.dist(1.0, shape=2)
resid_chol, _, _ = pm.LKJCholeskyCov(
"resid_chol", n=2, eta=3, sd_dist=sd_dist1, compute_corr=True
)
_ = pm.Deterministic("resid_cov", chol.dot(chol.T))
sigmas_resid = pm.MvNormal("sigmas_resid", kappa, chol=resid_chol)
# SE_ACAD ~ SUP_FRIENDS + SUP_PARENTS
regression_se_acad = pm.Normal(
"regr_se_acad",
beta_r[0] * ksi[obs_idx, 0] + beta_r[1] * ksi[obs_idx, 1],
sigmas_resid[0],
)
# SE_SOCIAL ~ SUP_FRIENDS + SUP_PARENTS
regression_se_social = pm.Normal(
"regr_se_social",
beta_r[2] * ksi[obs_idx, 0] + beta_r[3] * ksi[obs_idx, 1],
sigmas_resid[1],
)
# LS ~ SE_ACAD + SE_SOCIAL + SUP_FRIEND + SUP_PARENTS
regression = pm.Normal(
"regr",
beta_r2[0] * regression_se_acad
+ beta_r2[1] * regression_se_social
+ beta_r2[2] * ksi[obs_idx, 0]
+ beta_r2[3] * ksi[obs_idx, 1],
1,
)
m0 = tau[0] + regression_se_acad * lambdas_1[0]
m1 = tau[1] + regression_se_acad * lambdas_1[1]
m2 = tau[2] + regression_se_acad * lambdas_1[2]
m3 = tau[3] + regression_se_social * lambdas_2[0]
m4 = tau[4] + regression_se_social * lambdas_2[1]
m5 = tau[5] + regression_se_social * lambdas_2[2]
m6 = tau[6] + ksi[obs_idx, 0] * lambdas_3[0]
m7 = tau[7] + ksi[obs_idx, 0] * lambdas_3[1]
m8 = tau[8] + ksi[obs_idx, 0] * lambdas_3[2]
m9 = tau[9] + ksi[obs_idx, 1] * lambdas_4[0]
m10 = tau[10] + ksi[obs_idx, 1] * lambdas_4[1]
m11 = tau[11] + ksi[obs_idx, 1] * lambdas_4[2]
m12 = tau[12] + regression * lambdas_5[0]
m13 = tau[13] + regression * lambdas_5[1]
m14 = tau[14] + regression * lambdas_5[2]
mu = pm.Deterministic(
"mu", pm.math.stack([m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14]).T
)
_ = pm.Normal("likelihood", mu, Psi, observed=df[drivers].values)
idata = pm.sample(
10_000,
chains=4,
nuts_sampler="numpyro",
target_accept=0.99,
tune=2000,
idata_kwargs={"log_likelihood": True},
random_seed=110,
)
idata.extend(pm.sample_posterior_predictive(idata))
return model, idata
model_sem0, idata_sem0 = make_indirect_sem(
priors={"eta": 2, "lambda": [1, 1], "beta_r": 0.1, "beta_r2": 0.1}
)
model_sem1, idata_sem1 = make_indirect_sem(
priors={"eta": 2, "lambda": [1, 1], "beta_r": 0.2, "beta_r2": 0.2}
)
model_sem2, idata_sem2 = make_indirect_sem(
priors={"eta": 2, "lambda": [1, 1], "beta_r": 0.5, "beta_r2": 0.5}
)
Compiling...
Compilation time = 0:00:11.696197
Sampling...
Sampling time = 0:01:08.103534
Transforming variables...
Transformation time = 0:00:03.042150
Computing Log Likelihood...
Log Likelihood time = 0:00:01.889064
Sampling: [likelihood]
Compiling...
Compilation time = 0:00:11.975201
Sampling...
Sampling time = 0:01:08.567877
Transforming variables...
Transformation time = 0:00:03.029417
Computing Log Likelihood...
Log Likelihood time = 0:00:02.128598
Sampling: [likelihood]
Compiling...
Compilation time = 0:00:12.106062
Sampling...
Sampling time = 0:01:07.131131
Transforming variables...
Transformation time = 0:00:03.069262
Computing Log Likelihood...
Log Likelihood time = 0:00:01.810234
Sampling: [likelihood]
The main structural feature to observe is that we’ve now added a bunch of regressions to our model such that some of the constructs that we took as given in the measurement model are now derived as a linear combination of others. Because we removed the correlation effect between SE_SOCIAL
AND SE_ACAD
we re-introduce the possibility of their correlation by adding correlated error terms to their regression equations. In the lavaan
syntax we’re aiming for something like
Measurement model
SUP_Parents =~ sup_parents_p1 + sup_parents_p2 + sup_parents_p3
SUP_Friends =~ sup_friends_p1 + sup_friends_p2 + sup_friends_p3
SE_Academic =~ se_acad_p1 + se_acad_p2 + se_acad_p3
SE_Social =~ se_social_p1 + se_social_p2 + se_social_p3
LS =~ ls_p1 + ls_p2 + ls_p3
Regressions
SE_Academic ~ SUP_Parents + SUP_Friends
SE_Social ~ SUP_Parents + SUP_Friends
LS ~ SE_Academic + SE_Social + SUP_Parents + SUP_Friends
Residual covariances
SE_Academic ~~ SE_Social
pm.model_to_graphviz(model_sem0)
It’s worth pausing to examine the nature of the dependencies sketched in this diagram. We can see here how we’ve replaced the simpler measurement model structure and added three regression functions that replace the draws from the multivariate normal \(Ksi\). In other words we’ve expressed a dependency as a series of regressions all within the one model. Next we’ll see how the parameter estimates change across our prior specifications for the model. Notice the relative stability of the factor loadings compared to the regression coefficients.
fig, ax = plt.subplots(figsize=(15, 15))
az.plot_forest(
[idata_sem0, idata_sem1, idata_sem2],
model_names=["SEM0", "SEM1", "SEM2"],
var_names=["lambdas1", "lambdas2", "lambdas3", "lambdas4", "lambdas5", "beta_r", "beta_r2"],
combined=True,
ax=ax,
);
Model Evaluation Checks#
A quick evaluation of model performance suggests we do somewhat less well in recovering the sample covariance structures than we did with simpler measurement model.
residuals_posterior_cov = get_posterior_resids(idata_sem0, 2500)
residuals_posterior_corr = get_posterior_resids(idata_sem0, 2500, metric="corr")
fig, ax = plt.subplots(figsize=(20, 10))
mask = np.triu(np.ones_like(residuals_posterior_corr, dtype=bool))
ax = sns.heatmap(residuals_posterior_corr, annot=True, cmap="bwr", center=0, mask=mask)
ax.set_title("Residuals between Model Implied and Sample Correlations", fontsize=25);
But the posterior predictive checks still look reasonable. Our focal quantity of life-satisfaction seems to be recovered well.
make_ppc(idata_sem0, 100, drivers=drivers, dims=(5, 3))
Model diagnostics show generally healthy looking trace plots with some divergences, but the effective sample sizea and r-hat measures are fine so we should be generally pretty happy that the model has converged to the posterior distribution well.
az.plot_trace(
idata_sem0,
var_names=["lambdas1", "lambdas2", "lambdas3", "lambdas4", "lambdas5", "beta_r", "beta_r2"],
);
az.summary(
idata_sem0,
var_names=[
"lambdas1",
"lambdas2",
"lambdas3",
"lambdas4",
"lambdas5",
"beta_r",
"beta_r2",
"Psi",
"tau",
],
)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
lambdas1[se_acad_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas1[se_acad_p2] | 0.825 | 0.055 | 0.725 | 0.930 | 0.001 | 0.000 | 9636.0 | 15616.0 | 1.0 |
lambdas1[se_acad_p3] | 0.982 | 0.063 | 0.863 | 1.101 | 0.001 | 0.000 | 8860.0 | 14667.0 | 1.0 |
lambdas2[se_social_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas2[se_social_p2] | 0.999 | 0.062 | 0.885 | 1.118 | 0.001 | 0.001 | 5688.0 | 9867.0 | 1.0 |
lambdas2[se_social_p3] | 0.952 | 0.075 | 0.816 | 1.098 | 0.001 | 0.001 | 9841.0 | 16162.0 | 1.0 |
lambdas3[sup_friends_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas3[sup_friends_p2] | 0.804 | 0.045 | 0.720 | 0.888 | 0.000 | 0.000 | 10940.0 | 18878.0 | 1.0 |
lambdas3[sup_friends_p3] | 0.908 | 0.052 | 0.813 | 1.010 | 0.000 | 0.000 | 12075.0 | 20292.0 | 1.0 |
lambdas4[sup_parents_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas4[sup_parents_p2] | 1.013 | 0.054 | 0.915 | 1.117 | 0.001 | 0.000 | 8953.0 | 15516.0 | 1.0 |
lambdas4[sup_parents_p3] | 0.979 | 0.059 | 0.869 | 1.093 | 0.001 | 0.000 | 12016.0 | 19450.0 | 1.0 |
lambdas5[ls_p1] | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 40000.0 | 40000.0 | NaN |
lambdas5[ls_p2] | 0.547 | 0.046 | 0.463 | 0.634 | 0.000 | 0.000 | 23600.0 | 28513.0 | 1.0 |
lambdas5[ls_p3] | 0.656 | 0.056 | 0.552 | 0.759 | 0.000 | 0.000 | 21916.0 | 27988.0 | 1.0 |
beta_r[SUP_F->SE_ACAD] | 0.049 | 0.040 | -0.028 | 0.122 | 0.000 | 0.000 | 33619.0 | 30265.0 | 1.0 |
beta_r[SUP_P->SE_ACAD] | 0.220 | 0.043 | 0.138 | 0.301 | 0.000 | 0.000 | 26087.0 | 27608.0 | 1.0 |
beta_r[SUP_F->SE_SOC] | 0.146 | 0.035 | 0.080 | 0.213 | 0.000 | 0.000 | 26552.0 | 29710.0 | 1.0 |
beta_r[SUP_P->SE_SOC] | 0.271 | 0.039 | 0.198 | 0.344 | 0.000 | 0.000 | 18597.0 | 25964.0 | 1.0 |
beta_r2[SE_ACAD] | 0.166 | 0.074 | 0.028 | 0.305 | 0.000 | 0.000 | 41792.0 | 31794.0 | 1.0 |
beta_r2[SE_SOCIAL] | 0.272 | 0.080 | 0.118 | 0.420 | 0.000 | 0.000 | 42014.0 | 31550.0 | 1.0 |
beta_r2[SUP_F] | 0.063 | 0.057 | -0.043 | 0.171 | 0.000 | 0.000 | 34544.0 | 27449.0 | 1.0 |
beta_r2[SUP_P] | 0.251 | 0.062 | 0.136 | 0.369 | 0.000 | 0.000 | 29584.0 | 30999.0 | 1.0 |
Psi[se_acad_p1] | 0.417 | 0.029 | 0.362 | 0.471 | 0.000 | 0.000 | 11409.0 | 16465.0 | 1.0 |
Psi[se_acad_p2] | 0.413 | 0.024 | 0.366 | 0.457 | 0.000 | 0.000 | 19369.0 | 24906.0 | 1.0 |
Psi[se_acad_p3] | 0.462 | 0.028 | 0.408 | 0.516 | 0.000 | 0.000 | 17531.0 | 22823.0 | 1.0 |
Psi[se_social_p1] | 0.444 | 0.027 | 0.394 | 0.494 | 0.000 | 0.000 | 14886.0 | 22035.0 | 1.0 |
Psi[se_social_p2] | 0.338 | 0.026 | 0.291 | 0.389 | 0.000 | 0.000 | 10327.0 | 17290.0 | 1.0 |
Psi[se_social_p3] | 0.557 | 0.029 | 0.503 | 0.610 | 0.000 | 0.000 | 29639.0 | 29036.0 | 1.0 |
Psi[sup_friends_p1] | 0.517 | 0.039 | 0.444 | 0.591 | 0.000 | 0.000 | 10615.0 | 15242.0 | 1.0 |
Psi[sup_friends_p2] | 0.508 | 0.031 | 0.450 | 0.566 | 0.000 | 0.000 | 18625.0 | 24298.0 | 1.0 |
Psi[sup_friends_p3] | 0.624 | 0.036 | 0.556 | 0.691 | 0.000 | 0.000 | 21581.0 | 25635.0 | 1.0 |
Psi[sup_parents_p1] | 0.541 | 0.035 | 0.477 | 0.609 | 0.000 | 0.000 | 14766.0 | 22528.0 | 1.0 |
Psi[sup_parents_p2] | 0.537 | 0.037 | 0.468 | 0.605 | 0.000 | 0.000 | 13008.0 | 18715.0 | 1.0 |
Psi[sup_parents_p3] | 0.684 | 0.038 | 0.612 | 0.754 | 0.000 | 0.000 | 21999.0 | 26864.0 | 1.0 |
Psi[ls_p1] | 0.537 | 0.051 | 0.442 | 0.633 | 0.001 | 0.000 | 6824.0 | 10978.0 | 1.0 |
Psi[ls_p2] | 0.552 | 0.030 | 0.496 | 0.608 | 0.000 | 0.000 | 21921.0 | 25170.0 | 1.0 |
Psi[ls_p3] | 0.670 | 0.036 | 0.603 | 0.740 | 0.000 | 0.000 | 19160.0 | 24500.0 | 1.0 |
tau[se_acad_p1] | 5.058 | 0.048 | 4.966 | 5.148 | 0.001 | 0.001 | 4545.0 | 10287.0 | 1.0 |
tau[se_acad_p2] | 5.266 | 0.042 | 5.186 | 5.345 | 0.001 | 0.000 | 5105.0 | 12105.0 | 1.0 |
tau[se_acad_p3] | 5.115 | 0.049 | 5.022 | 5.208 | 0.001 | 0.000 | 4915.0 | 12071.0 | 1.0 |
tau[se_social_p1] | 5.175 | 0.046 | 5.087 | 5.262 | 0.001 | 0.001 | 3954.0 | 9674.0 | 1.0 |
tau[se_social_p2] | 5.364 | 0.043 | 5.283 | 5.444 | 0.001 | 0.001 | 3632.0 | 8857.0 | 1.0 |
tau[se_social_p3] | 5.327 | 0.049 | 5.236 | 5.421 | 0.001 | 0.000 | 5021.0 | 11942.0 | 1.0 |
tau[sup_friends_p1] | 5.607 | 0.070 | 5.473 | 5.735 | 0.001 | 0.001 | 3545.0 | 7220.0 | 1.0 |
tau[sup_friends_p2] | 5.864 | 0.059 | 5.754 | 5.979 | 0.001 | 0.001 | 3903.0 | 8593.0 | 1.0 |
tau[sup_friends_p3] | 5.822 | 0.068 | 5.696 | 5.954 | 0.001 | 0.001 | 4102.0 | 8858.0 | 1.0 |
tau[sup_parents_p1] | 5.769 | 0.068 | 5.641 | 5.895 | 0.001 | 0.001 | 3000.0 | 7066.0 | 1.0 |
tau[sup_parents_p2] | 5.719 | 0.068 | 5.587 | 5.843 | 0.001 | 0.001 | 3132.0 | 7931.0 | 1.0 |
tau[sup_parents_p3] | 5.512 | 0.071 | 5.378 | 5.644 | 0.001 | 0.001 | 3647.0 | 9103.0 | 1.0 |
tau[ls_p1] | 5.010 | 0.073 | 4.873 | 5.149 | 0.001 | 0.001 | 4056.0 | 9242.0 | 1.0 |
tau[ls_p2] | 5.671 | 0.050 | 5.578 | 5.765 | 0.001 | 0.000 | 5777.0 | 13336.0 | 1.0 |
tau[ls_p3] | 5.096 | 0.060 | 4.984 | 5.210 | 0.001 | 0.001 | 5766.0 | 12740.0 | 1.0 |
Similar diagnostic results hold for the other models. We now continue to assess questions of direct and indirect effects that were obscure in the simpler measurement model. By which I mean we trace out the total paths that influence life-satisfaction and assess the relative strength of impact due to parental and peer support.
Indirect and Direct Effects#
We now turn the additional regression structures that we’ve encoded into the model graph. First we pull out the regression coefficients
fig, axs = plt.subplots(1, 2, figsize=(20, 8))
az.plot_energy(idata_sem0, ax=axs[0])
az.plot_forest(idata_sem0, var_names=["beta_r", "beta_r2"], combined=True, ax=axs[1])
axs[1].axvline(0, color="red", label="zero-effect")
axs[1].legend();
The coefficients indicate a smaller relative weight accorded to the effects of peer support than we see with parental support. This is borne out as we trace out the cumulative causal effects (direct and indirect) through our DAG or chain of regression coefficients.
def calculate_effects(idata, var="SUP_P"):
summary_df = az.summary(idata, var_names=["beta_r", "beta_r2"])
# Indirect Paths
## VAR -> SE_SOC ->LS
indirect_parent_soc = (
summary_df.loc[f"beta_r[{var}->SE_SOC]"]["mean"]
* summary_df.loc["beta_r2[SE_SOCIAL]"]["mean"]
)
## VAR -> SE_SOC ->LS
indirect_parent_acad = (
summary_df.loc[f"beta_r[{var}->SE_ACAD]"]["mean"]
* summary_df.loc["beta_r2[SE_ACAD]"]["mean"]
)
## Total Indirect Effects
total_indirect = indirect_parent_soc + indirect_parent_acad
## Total Effects
total_effect = total_indirect + summary_df.loc[f"beta_r2[{var}]"]["mean"]
return pd.DataFrame(
[[indirect_parent_soc, indirect_parent_acad, total_indirect, total_effect]],
columns=[
f"{var} -> SE_SOC ->LS",
f"{var} -> SE_ACAD ->LS",
f"Total Indirect Effects {var}",
f"Total Effects {var}",
],
)
Importantly we see here the effect of priors on the implied relationships. As we pull our priors closer to 0 the total effects of parental support is pulled downwards away from .5, while the peer support remains relatively stable ~.10. However it remains clear that the impact of parental support dwarfs the effects due to peer support.
summary_p = pd.concat(
[calculate_effects(idata_sem0), calculate_effects(idata_sem1), calculate_effects(idata_sem2)]
)
summary_p.index = ["SEM0", "SEM1", "SEM2"]
summary_p
SUP_P -> SE_SOC ->LS | SUP_P -> SE_ACAD ->LS | Total Indirect Effects SUP_P | Total Effects SUP_P | |
---|---|---|---|---|
SEM0 | 0.073712 | 0.03652 | 0.110232 | 0.361232 |
SEM1 | 0.133672 | 0.04914 | 0.182812 | 0.471812 |
SEM2 | 0.177920 | 0.04840 | 0.226320 | 0.515320 |
The sensitivity of the estimated impact due to parental support varies strongly as a function of our prior on the variances. Here is a substantive example of the role of theory choice in model design. How strongly should believe that parental and peer effects have 0 effect on life-satisfaction? I’m inclined to believe we’re too conservative if we try and shrink the effect toward zero and should prefer a less conservative model. However, the example here is not to dispute the issue, but demonstrate the importance of sensitivity checks.
summary_f = pd.concat(
[
calculate_effects(idata_sem0, "SUP_F"),
calculate_effects(idata_sem1, "SUP_F"),
calculate_effects(idata_sem2, "SUP_F"),
]
)
summary_f.index = ["SEM0", "SEM1", "SEM2"]
summary_f
SUP_F -> SE_SOC ->LS | SUP_F -> SE_ACAD ->LS | Total Indirect Effects SUP_F | Total Effects SUP_F | |
---|---|---|---|---|
SEM0 | 0.039712 | 0.008134 | 0.047846 | 0.110846 |
SEM1 | 0.068572 | 0.009828 | 0.078400 | 0.127400 |
SEM2 | 0.089516 | 0.009152 | 0.098668 | 0.130668 |
Calculating Global Model Fit#
We can also calculate global measures of model fit. Here we compare, somewhat unfairly, the measurement model and various incarnations of our SEM model. The SEM models are attempting to articulate more complex structures and can suffer in the simple measures of global fit against comparably simpler models. The complexity is not arbitrary and you need to make a decision regarding trade-offs between expressive power and global model fit against the observed data points.
compare_df = az.compare(
{"SEM0": idata_sem0, "SEM1": idata_sem1, "SEM2": idata_sem2, "MM": idata_mm}
)
compare_df
/Users/nathanielforde/mambaforge/envs/pymc_causal/lib/python3.11/site-packages/arviz/stats/stats.py:309: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'True' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
df_comp.loc[val] = (
/Users/nathanielforde/mambaforge/envs/pymc_causal/lib/python3.11/site-packages/arviz/stats/stats.py:309: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'log' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
df_comp.loc[val] = (
rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
---|---|---|---|---|---|---|---|---|---|
MM | 0 | -3728.300062 | 994.604514 | 0.000000 | 8.023087e-01 | 65.332293 | 0.000000 | True | log |
SEM0 | 1 | -3778.256742 | 1109.385888 | 49.956680 | 1.976913e-01 | 64.764420 | 13.618924 | True | log |
SEM1 | 2 | -3781.419677 | 1104.681730 | 53.119615 | 3.276563e-15 | 64.853007 | 13.459803 | True | log |
SEM2 | 3 | -3782.787099 | 1102.006963 | 54.487037 | 0.000000e+00 | 64.871911 | 13.330740 | True | log |
Conclusion#
We’ve just seen how we can go from thinking about the measurment of abstract psychometric constructs, through the evaluation of complex patterns of correlation and covariances among these latent constructs to evaluating hypothetical causal structures amongst the latent factors. This is a bit of whirlwind tour of psychometric models and the expressive power of SEM and CFA models, which we’re ending by linking them to the realm of causal inference! This is not an accident, but rather evidence that causal concerns sit at the heart of most modeling endeavours. When we’re interested in any kind of complex joint-distribution of variables, we’re likely interested in the causal structure of the system - how are the realised values of some observed metrics dependent on or related to others? Importantly, we need to understand how these observations are realised without confusing simple correlation for cause through naive or confounded inference.
Mislevy and Levy highlight this connection by focusing on the role of De Finetti’s theorem in the recovery of exchangeablility through Bayesian inference. By De Finetti’s theorem a distribution of exchangeable sequence of variables be expressed as mixture of conditional independent variables.
This representation licenses substantive claims about the system. So if we specify the conditional distribution correctly, we recover the conditions that warrant inference with a well designed model because the subject’s outcomes are considered exchangeable conditional on our model. The mixture distribution is just the vector of parameters upon which we condition our model. This plays out nicely in SEM and CFA models because we explicitly structure the interaction of the system to remove biasing dependence structure and license clean inferences. Holding fixed levels of the latent constructs we expect to be able to draw generalisable claims the expected realisations of the indicator metrics.
[C]onditional independence is not a grace of nature for which we must wait passively, but rather a psychological necessity which we satisfy by organising our knowledge in a specific way. An important tool in such an organisation is the identification of intermediate variables that induce conditional independence among observables; if such variables are not in our vocabulary, we create them. In medical diagnosis, for instance, when some symptoms directly influence one another, the medical profession invents a name for that interaction (e.g. “syndrome”, “complication”, “pathological state”) and treats it as a new auxiliary variable that induces conditional independence.” - Pearl quoted in Levy and Mislevy [2020] p61
It’s this deliberate and careful focus on the structure of conditionalisation that unites the seemingly disparate disciplines of psychometrics and causal inference. Both disciplines cultivate careful thinking about the structure of the data generating process and, more, they proffer conditionalisation strategies to better target some estimand of interest. Both are well phrased in the expressive lexicon of a probabilistic programming language like PyMC
. We encourage you to explore the rich possibilities for yourself!
References#
Watermark#
%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor
Last updated: Wed Sep 25 2024
Python implementation: CPython
Python version : 3.11.7
IPython version : 8.20.0
pytensor: 2.18.6
pandas : 2.2.0
pymc : 5.10.3
arviz : 0.17.0
matplotlib: 3.8.2
pytensor : 2.18.6
numpy : 1.24.4
seaborn : 0.13.2
Watermark: 2.4.3