Good & Bad Controls#

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 06 - Good & Bad Controls

# Ignore warnings
import warnings

import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import statsmodels.formula.api as smf
import utils as utils
import xarray as xr

from matplotlib import pyplot as plt
from matplotlib import style
from scipy import stats as stats

warnings.filterwarnings("ignore")

# Set matplotlib style
STYLE = "statistical-rethinking-2023.mplstyle"
style.use(STYLE)

Avoid being clever at all costs#

Being clever is

  • unreliable

  • opaque

Using explicit causal models allows one to:

  • derive implications using logic

  • verify work & assumptions

  • facilitates peer review & verification

Confounds#

Review#

  • The Fork \(X \leftarrow Z \rightarrow Y\)

    • \(X\) adn \(Y\) are associated unless we stratify by \(Z\)

  • The Pipe \(X \rightarrow Z \rightarrow Y\)

    • \(X\) adn \(Y\) are associated unless we stratify by \(Z\)

  • The Fork \(X \rightarrow Z \leftarrow Y\)

    • \(X\) adn \(Y\) are not associated unless we stratify by \(Z\)

  • The Descendant \(Z \rightarrow A\)

    • Descendent \(A\) takes on behavior of parent \(Z\)

utils.draw_causal_graph(
    edge_list=[("U", "X"), ("U", "Y"), ("X", "Y")],
    node_props={
        "X": {"label": "treatment, X"},
        "Y": {"label": "outcome, Y"},
        "U": {"label": "confound, U", "style": "dashed"},
        "unmeasured": {"style": "dashed"},
    },
    graph_direction="LR",
)
../_images/f076109de1e7d8ecc4abf08e618b077f79cbe23e0ff05f29247679a8ba778933.svg
utils.draw_causal_graph(
    edge_list=[("U", "X"), ("U", "Y"), ("X", "Y"), ("R", "X")],
    node_props={
        "X": {"label": "treatment, X"},
        "Y": {"label": "outcome, Y"},
        "U": {"label": "confound, U", "style": "dashed"},
        "R": {"label": "randomization, R"},
        "unmeasured": {"style": "dashed"},
    },
    edge_props={("U", "X"): {"color": "lightgray"}},
    graph_direction="LR",
)
../_images/457bd8644f1ebd5a73667d802ef2389e79dd3677db1a1675c7dbc6573bb71ddf.svg

The gold standard is Randomization. However randomization often this generally isn’t possible:

  • impossible

  • pragmatism

  • ethical concerns

  • unmeasured confounds

Causal Thinking#

  • We would like a procedure \(do(X)\) that intervenes on \(X\) in such a way that it can “mimic” the effect of randomization.

  • Such a procedure would transform the Confounded graph:

Without randomization#

utils.draw_causal_graph(
    [
        ("U", "X"),
        ("U", "Y"),
        ("X", "Y"),
    ],
    graph_direction="LR",
)
../_images/cd456ec6c9461e3507147ef461166b1c220f29acc2eaf89c36e09318182380d0.svg

in such a way that all the non-causal arrows entering X have been removed

With “randomization” induced by \(do(X)\)#

utils.draw_causal_graph(
    [
        ("U", "Y"),
        ("X", "Y"),
    ],
    edge_props={("X", "Y"): {"label": "do(X)   "}},
    graph_direction="LR",
)
../_images/38473a03ba2091d24b3e083985fcff67e2242df46c419b674920b0b79fa49bcf.svg

It turns out that we can analyze graph structure to determine if there is such a procedure that exists

Example: Simple Confound#

utils.draw_causal_graph(
    edge_list=[("U", "Y"), ("U", "X"), ("X", "Y")],
    node_props={
        "U": {"color": "red"},
    },
    edge_props={("U", "X"): {"color": "red"}, ("U", "Y"): {"color": "red"}},
    graph_direction="LR",
)
../_images/02abbf1328b44c29aecc81cc2b0959e3a2c4e78cafe3d173d15749984ccffa79.svg

In the Fork example, we’ve shown that stratifying by the confound, we “close” the fork by conditioning on U, thus removing any of the causal effect of U on X, thus allowing us to isolate the treatment’s effect on Y.

This procedure is part of what is known as Do-calculus. The operator do(X) tends to mean intervening on X (i.e. setting it to a specific value that is independent of the confound)

\[ p(Y | do(X)) = \sum_U p(Y | X, U)p(U) = \mathbb{E}_U[p(Y | X, U)] \]

i.e. the procedure that gives us the intervention on X is equivalent of the distribution of Y, stratified by the treatment X and the confound X, averaged over the distribution of the confound.

Note that when we use linear regression estimator for each X, we are implicity marginalizing and averaging over out treatment and confound (e.g. in the model form \(Y \sim \mathcal{N}(\alpha + \beta_X X + \beta_Z Z, \sigma^2)\)

  • it is generally not the estimated coefficient in the model that relate X to Y

  • it is the distribution of Y when we change X, averaged over the distribution defined by the control/confound variables (i.e. U)

Do-calculus#

  • Applied to DAGs, provides a set of rules for identifying \(p(Y | do(X))\)

  • Informs what is possible before picking functions or distributions

  • Justifies graphical analysis

  • If do-calculus claims that inference is possible no further special assumptions are required for inference

    • often additional assumptions can make the inference even stronger

Backdoor Criterion#

Shortcut for applying Do-calculus graphically with your eyeballs. General rule for finding the minimal sufficient adjustment set of variables to condition on.

  1. Identify all paths connecting treatment X to to outcome Y, including those entering and exiting X (association can be directe/undirected, causation is directed)

  2. Any of those paths entering X are backdoor (non-causal) paths

  3. Find the adjustment set of variables that, once conditioned on, “closes/blocks” all the backdoor paths identified in

Backdoor Criterion Example#

utils.draw_causal_graph(
    [("U", "Z"), ("Z", "X"), ("U", "Y"), ("X", "Y")],
    node_props={"Z": {"color": "red"}, "U": {"style": "dashed"}},
    edge_props={
        ("U", "Z"): {"color": "red"},
        ("Z", "X"): {"color": "red"},
        ("U", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/498111632055e1648c9cb65a17422b6ea33630a1dfc858e000f911fc1a85ff26.svg

Backdoor path highlighted in red.

  • If we could measure \(U\) we could just stratify by \(U\); however, it is unobserved.

  • However, we can block the backdoor path by conditioning on \(Z\), despite not being able to measure \(U\).

  • This works because \(Z\) “knows” everything we need to know about association between \(X\), \(Y\) that is due to the unmeasured confound \(U\).

Resulting graph after stratifying by \(Z\)#

  • The \(U \rightarrow Z \rightarrow X\) Pipe has now been broken, disassociateing \(X\) from the confound \(U\)

  • Note: this doesn’t remove the confound’s effect on \(Y\)

utils.draw_causal_graph(
    [("U", "Z"), ("Z", "X"), ("U", "Y"), ("X", "Y")],
    node_props={"Z": {"color": "red"}, "U": {"style": "dashed"}},
    edge_props={
        ("U", "Z"): {"color": "red"},
        ("Z", "X"): {"color": "none"},
        ("U", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/64f0adfae4032a002ac4e36f769dbb8c9d5de5605380b84898ef9232d852b654.svg

Validate through simulation#

Here we simulate a situation where Y is caused by X and and an unmeasured confound U that also effects Z and X. (We could also prove mathematically, but simulation is quite confincing as well–for me anyways)

\[\begin{split} \begin{align*} U &\sim \text{Bernoulli}(0.5) \\ Z &\sim \text{Normal}(\beta_{UZ}U, 1) \\ X &\sim \text{Normal}(\beta_{ZX}Z, 1) \\ Y &\sim \text{Normal}(\alpha + \beta_{XY}X + \beta_{UY}U, 1) \\ \end{align*} \end{split}\]
np.random.seed(123)
n_samples = 200

alpha = 0
beta_XY = 0
beta_UY = -1
beta_UZ = -1
beta_ZX = 1

U = stats.bernoulli.rvs(p=0.5, size=n_samples)
Z = stats.norm.rvs(loc=beta_UZ * U)
X = stats.norm.rvs(loc=beta_ZX * Z)
Y = stats.norm.rvs(loc=alpha + beta_XY * X + beta_UY * U)

Unstratified (confounded) Model#

\[\begin{split} \begin{align*} Y &\sim \text{Normal}(\mu_Y, \sigma_Y) \\ \mu_Y &= \alpha + \beta_{XY}X \\ \alpha &\sim \text{Normal}(0, 1) \\ \beta_{XY} &\sim \text{Normal}(0, 1) \\ \sigma_Y &\sim \text{Exponential}(1) \end{align*} \end{split}\]

Fit the unstratified model, ignoring Z (and U)#

with pm.Model() as unstratified_model:
    # Priors
    alpha_ = pm.Normal("alpha", 0, 1)
    beta_XY_ = pm.Normal("beta_XY", 0, 1)
    sigma_ = pm.Exponential("sigma", 1)

    # Likelihood
    mu_ = alpha_ + beta_XY_ * X
    Y_ = pm.Normal("Y", mu=mu_, sigma=sigma_, observed=Y)
    unstratified_model_inference = pm.sample()
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta_XY, sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
az.summary(unstratified_model_inference)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha -0.381 0.084 -0.535 -0.219 0.001 0.001 3503.0 3034.0 1.0
beta_XY 0.148 0.055 0.043 0.244 0.001 0.001 3763.0 3040.0 1.0
sigma 1.114 0.058 1.015 1.227 0.001 0.001 6091.0 3259.0 1.0

Stratifying by Z (unconfounded)#

\[\begin{split} \begin{align*} Y &\sim \text{Normal}(\mu_Y, \sigma_Y) \\ \mu_Y = &\alpha + \beta_{XY}X + \beta_{Z}Z \\ \alpha &\sim \text{Normal}(0, 1) \\ \beta_{*} &\sim \text{Normal}(0, 1) \\ \sigma_Y &\sim \text{Exponential}(1) \end{align*} \end{split}\]
# Fit the stratified Model
with pm.Model() as stratified_model:
    # Priors
    alpha_ = pm.Normal("alpha", 0, 1)
    beta_XY_ = pm.Normal("beta_XY", 0, 1)
    beta_Z_ = pm.Normal("beta_Z", 0, 1)
    sigma_ = pm.Exponential("sigma", 1)

    # Likelihood (includes Z)
    mu_ = alpha_ + beta_XY_ * X + beta_Z_ * Z
    Y_ = pm.Normal("Y", mu=mu_, sigma=sigma_, observed=Y)
    stratified_model_inference = pm.sample()
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta_XY, beta_Z, sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
az.summary(stratified_model_inference)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha -0.286 0.091 -0.450 -0.109 0.002 0.001 3570.0 2743.0 1.0
beta_XY -0.026 0.078 -0.181 0.114 0.001 0.001 2989.0 2631.0 1.0
beta_Z 0.318 0.106 0.118 0.515 0.002 0.001 2650.0 2469.0 1.0
sigma 1.091 0.055 0.983 1.190 0.001 0.001 3813.0 2660.0 1.0

NOTE: the model coefficient beta_Z means nothing in in terms of causal effect of \(Z\) on \(Y\). In order to determine the causal effect of \(Z\) on \(Y\) you’d need a different estimator. In general, variables in the adjustment set are not interpretable. This is related to the “Table 2 Fallacy”

Compare stratified and unstratified models#

_, ax = plt.subplots(figsize=(8, 3))
az.plot_dist(unstratified_model_inference.posterior.beta_XY, color="k", label="Y|X", ax=ax)
az.plot_dist(stratified_model_inference.posterior.beta_XY, color="C0", label="Y|X,Z", ax=ax)
plt.axvline(beta_XY, color="C0", linestyle="--", alpha=0.5, label=f"Actual={beta_XY}")
plt.xlim([-0.5, 0.5])
plt.xlabel("posterior beta_XY")
plt.ylabel("density")
plt.legend();
../_images/d1a95b071b8657ccc42a02f43f9b770c06d19ae050808023c481ecf40f752bf6.png

More Complicated Example#

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    graph_direction="LR",
)
../_images/af4927adeee072c378aab5b8dea2a16f5eb87f5aacfb5937da373fe90a7e4362.svg

All Paths Connecting X to Y#

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    edge_props={
        ("X", "Y"): {"color": "blue", "label": "Direct Causal Path", "fontcolor": "blue"},
    },
    graph_direction="LR",
)
../_images/b094c477304777b55982c7ee6b51120337600355eab53f997495ccc3a83a685b.svg

\(X \rightarrow Y\)#

  • Direct, causal path, leave open

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    edge_props={
        ("C", "X"): {"color": "red", "label": "XCY", "fontcolor": "red"},
        ("C", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/f268703cdd869f3d0a74da1ebf0b4f25ebf122807530ad107a78e2a21145cc45.svg

\(X \leftarrow C \rightarrow Y\)#

  • Backdoor non-causal path

  • Block by stratifying by \(C\)

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    edge_props={
        ("Z", "X"): {"color": "red", "label": "XZY", "fontcolor": "red"},
        ("Z", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/b34cf65e46ca6d91ed70d0da88c72ec7e1dadb752c8bde0cc6157a6959228d90.svg

\(X \leftarrow Z \rightarrow Y\)#

  • Backdoor non-causal path

  • Block by stratifying by \(Z\)

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    edge_props={
        ("A", "X"): {"color": "red", "label": "XAZBY", "fontcolor": "red"},
        ("A", "Z"): {"color": "red"},
        ("B", "Z"): {"color": "red"},
        ("B", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/87028ee557d51cb7c21eab266d5b90f34b7e5b74644a189ede335a82ea29f7cb.svg

\(X \leftarrow A \rightarrow Z \leftarrow B \rightarrow Y\)#

  • Backdoor non-causal path

  • Block by stratifying by \(A\) or \(B\); stratifying by \(Z\) opens the path b.c. it’s a collider

    • we’re already stratifying by \(Z\) for the \(X,Z,Y\) backdoor path

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    edge_props={
        ("A", "X"): {"color": "red", "label": "XAZY", "fontcolor": "red"},
        ("A", "Z"): {"color": "red"},
        ("Z", "Y"): {"color": "red"},
        ("Z", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/46887c03eeac571a3949cf9951a2b0777f1572e887d905db2aa9cef32e36130a.svg

\(X \leftarrow A \rightarrow Z \rightarrow Y\)#

  • Backdoor non-causal path

  • Block by stratifying by \(A\); stratifying by \(Z\) opens the path b.c. it’s a collider

    • we’re already stratifying by \(Z\) for the \(X \leftarrow Z \rightarrow Y\) backdoor path

utils.draw_causal_graph(
    edge_list=[
        ("A", "Z"),
        ("A", "X"),
        ("Z", "X"),
        ("B", "Z"),
        ("B", "Y"),
        ("Z", "Y"),
        ("X", "Y"),
        ("C", "X"),
        ("C", "Y"),
    ],
    edge_props={
        ("Z", "X"): {"color": "red", "label": "XZBY", "fontcolor": "red"},
        ("B", "Z"): {"color": "red"},
        ("B", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/a91e8cfb9c3bf79e70418755a66d13a042fb846ddeb9523623996ee7a8032d26.svg

\(X \leftarrow Z \leftarrow B \rightarrow Y\)#

  • Backdoor non-causal path

  • Block by stratifying by \(B\); stratifying by \(Z\) opens the path b.c. it’s a collider

    • we’re already stratifying by \(Z\) for the \(X \leftarrow Z \rightarrow Y\) backdoor path

Resulting Minimum Adjustment set: Z, C, (A or B)#

Chossing B over A turns out to be more statistically efficient, though not causally different than choosing A

Example with unobserved confounds#

utils.draw_causal_graph(
    edge_list=[("G", "P"), ("P", "C"), ("G", "C"), ("u", "P"), ("u", "C")],
    node_props={
        "G": {"color": "blue"},
        "C": {"color": "blue"},
        "P": {"color": "red", "label": "collider, P"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={("G", "C"): {"color": "blue"}},
    graph_direction="LR",
)
../_images/a8ff8e554ddaec0d395dadffebdf01228279466f92309f17a162fd31b6cbcff4.svg
  • \(P\) is mediator between \(G\) and \(C\)

  • \(P\) is also a collider between \(G, u, C\)

  • If we want to estimate direct effect of \(G \rightarrow C\), we’ll need to stratify by \(P\)–close the Pipe

  • However, this will open up the Collider path to the unobserved confound.

  • It’s not possible to accurately estimate the Direct Causal Effect of G on C

  • It is possible to estimate the Total Causal Effect

Good and bad controls#

Common incorrect heuristics for choosing control variables:#

  • YOLO approach – anything in the spreadsheet

  • Ignore highly colinear variables

    • false, no support for this

    • colinearity can arise through many different causal processes that can be modeled accurately

  • It’s safe to add pre-treatment variables

    • false, pre-treatment, just like post-treatment variables can cause confounds.

Good & Bad Controls Examples#

Bad control#

utils.draw_causal_graph(
    edge_list=[("u", "Z"), ("v", "Z"), ("u", "X"), ("X", "Y"), ("v", "Y")],
    node_props={
        "u": {"style": "dashed"},
        "v": {"style": "dashed"},
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={("X", "Y"): {"color": "red"}},
    graph_direction="TD",
)
../_images/92e72073f2c098b0da1b824a6db62383614f4b2a7a4c476d5ab6af082e3287c6.svg

\(Z\) is a collider for unobserved variables \(u\) and \(v\), which independently affect \(X\) and \(Y\)

List the paths#

  • \(X \rightarrow Y\)

    • causal, leave open

  • \(X \leftarrow u \rightarrow Z \leftarrow v \rightarrow Y\)

    • backdoor, closed due to collider

    • \(Z\) is a bad control: stratifying by \(Z\) would open the backdoor path

    • \(Z\) could be a pre-treatment variable – not always good to stratify by pre-treatment variables; draw your causal assumptions

Bad mediator#

utils.draw_causal_graph(
    edge_list=[
        ("X", "Z"),
        ("Z", "Y"),
        ("u", "Z"),
        ("u", "Y"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={("X", "Z"): {"color": "red"}, ("Z", "Y"): {"color": "red"}},
    graph_direction="LR",
)
../_images/1191f34e2efcf662e5d2f3970d081cf8ca3067f6a937c1e8d0eaae585322ae23.svg

List the paths#

  • \(X \rightarrow Z \rightarrow Y\)

    • causal, leave open

  • \(X \rightarrow Z \leftarrow u \rightarrow Y\)

    • backdoor, but only if stratifying by Z

  • There is no backdoor path, so no need to stratify by Z

  • Can measure total effect of \(X\) on \(Y\), but not direct effect, because of Mediatior \(Z\)

No backdoor path here, so no need to control for any confounds. In fact, stratifying by Z (the bad mediator) will introduce bias in estimate because it introduces the causal effect of u that would otherwise be blocked.

Z is often a post-treatment variable, e.g. below, where “Happiness” is affected by the treatment “Win Lottery”

Verify bad mediatior with simulation:#

utils.draw_causal_graph(
    edge_list=[
        ("X", "Z"),
        ("Z", "Y"),
        ("u", "Z"),
        ("u", "Y"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={
        ("X", "Z"): {"color": "red", "label": "1"},
        ("Z", "Y"): {"color": "red", "label": "1"},
        ("u", "Z"): {"label": "1"},
        ("u", "Y"): {"label": "1"},
    },
    graph_direction="LR",
)
../_images/e179f8f5da9403c1b4a12461eb16d6c8b7870c82ac941247eb4ac6713bfddbbc.svg
def simulate_bad_mediator(beta_XZ, beta_ZY, n_samples=100):
    # independent variables
    u = stats.norm.rvs(size=n_samples)
    X = stats.norm.rvs(size=n_samples)

    # causal effect of X on Z
    mu_Z = X * beta_XZ + u
    Z = stats.norm.rvs(size=n_samples, loc=mu_Z)

    # Causal effect of Z on Y (including confound)
    mu_Y = Z * beta_ZY + u
    Y = stats.norm.rvs(size=n_samples, loc=mu_Y)

    # Put data into format for statsmodels
    data = pd.DataFrame(np.vstack([Y, X, Z, u]).T, columns=["Y", "X", "Z", "u"])

    unstratified_model = smf.ols("Y ~ X", data=data).fit()
    stratified_model = smf.ols("Y ~ X + Z", data=data).fit()

    return unstratified_model.params.X, stratified_model.params.X


def run_bad_mediator_simulation(
    beta_XZ=1, beta_ZY=1, n_simulations=500, n_samples_per_simualtion=100
):
    beta_X = beta_XZ * beta_ZY

    simulations = np.array(
        [
            simulate_bad_mediator(
                beta_XZ=beta_XZ, beta_ZY=beta_ZY, n_samples=n_samples_per_simualtion
            )
            for _ in range(n_simulations)
        ]
    )
    _, ax = plt.subplots(figsize=(8, 4))
    az.plot_dist(simulations[:, 0], label="Y ~ X\ncorrect", color="black", ax=ax)
    az.plot_dist(simulations[:, 1], label="Y ~ X + Z\nwrong", color="C0", ax=ax)
    plt.axvline(beta_X, color="k", linestyle="--", label=f"actual={beta_X}")
    plt.legend(loc="upper left")
    plt.xlabel("posterior mean")
    plt.ylabel("density");

Run the simulation, \(\beta_{XZ} = \beta_{ZY} = 1\)#

run_bad_mediator_simulation(beta_XZ=1, beta_ZY=1)
../_images/f4620695baa2336c0e1b99e5b0df0db82914fff9f798582005382b3cb48d5e1e.png
utils.draw_causal_graph(
    edge_list=[
        ("X", "Z"),
        ("Z", "Y"),
        ("u", "Z"),
        ("u", "Y"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={
        ("X", "Z"): {"color": "red", "label": "1"},
        ("Z", "Y"): {"color": "red", "label": "0"},
        ("u", "Z"): {"label": "1"},
        ("u", "Y"): {"label": "1"},
    },
    graph_direction="LR",
)
../_images/3312adf469dc75c65907a4eaaefcee87216281978679a460cea13438368aca4b.svg

Turn off Causal effect by changing \(\beta_{ZY}\) to 0#

run_bad_mediator_simulation(beta_XZ=1, beta_ZY=0)
../_images/7a9942f183ddd63d0320ba79ebb360271be0c929057552d28038287de7c70d4c.png

though there is no causal effect, you end up concluding a negative effect of X on Y

Colliders & Descendants#

Generally, Avoid the Collider!

Adding descendants of the target variable is almost always a terrible idea, because your selecting groups based on the outcome. This is known as Case Control Bias (selection on outcome)

utils.draw_causal_graph(
    edge_list=[("X", "Y"), ("X", "Z"), ("Y", "Z")],
    node_props={"X": {"color": "red"}, "Y": {"color": "red"}},
    edge_props={("X", "Y"): {"color": "red"}},
)
../_images/a2505a4cbe27021cef93c82acae294a4b596254e7f47ed7e61dd189cad439c66.svg

Colliders not always so obvious

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("X", "Z"),
        ("u", "Y"),
        ("u", "Z"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={("X", "Y"): {"color": "red"}},
)
../_images/dc1c7caa3bcaa6d9a1ac25acc9d57b7ba50a9e919a1dfe08ae6f537b892d47ce.svg

Collider is formed by unobserved variable u

Bad Descendent: Selection on Outcome (Case Control Bias)#

Stratifying on a variable affected by the outcome is a very bad practice.

  • reduces variation in \(Y\) that could have been explained by \(X\)

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("Y", "Z"),
    ],
    node_props={
        "X": {"color": "red", "label": "education, X"},
        "Y": {"color": "red", "label": "occupation, Y"},
        "Z": {"label": "income, Z"},
    },
    edge_props={("X", "Y"): {"color": "red"}},
    graph_direction="LR",
)
../_images/8413890dfd0c183df3461e0cc32900b667309566aa50e468085944e35c97b96e.svg

Verify via simulation:#

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("Y", "Z"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
    },
    edge_props={("X", "Y"): {"color": "red", "label": "1"}, ("Y", "Z"): {"label": "1"}},
    graph_direction="LR",
)
../_images/b794ee9f365a6c05a3038714bd5ac1889fa33d5319909571a0fc96abea79f067.svg
def simulate_case_control_bias(beta_XY=1, beta_YZ=1, n_samples=100):
    # independent variables
    X = stats.norm.rvs(size=n_samples)

    # Causal effect of Z on Y (including confound)
    mu_Y = X * beta_XY
    Y = stats.norm.rvs(size=n_samples, loc=mu_Y)

    # causal effect of X on Z
    mu_Z = Y * beta_YZ
    Z = stats.norm.rvs(size=n_samples, loc=mu_Z)

    # Put data into format for statsmodels
    data = pd.DataFrame(np.vstack([Y, X, Z]).T, columns=["Y", "X", "Z"])

    unstratified_model = smf.ols("Y ~ X", data=data).fit()
    stratified_model = smf.ols("Y ~ X + Z", data=data).fit()

    return unstratified_model.params.X, stratified_model.params.X


def run_case_control_simulation(
    beta_XY=1, beta_YZ=1, n_simulations=500, n_samples_per_simualtion=100
):
    beta_X = beta_XY

    simulations = np.array(
        [
            simulate_case_control_bias(
                beta_XY=beta_XY, beta_YZ=beta_YZ, n_samples=n_samples_per_simualtion
            )
            for _ in range(n_simulations)
        ]
    )
    _, ax = plt.subplots(figsize=(8, 4))
    az.plot_dist(simulations[:, 0], label="Y ~ X\ncorrect", color="black", ax=ax)
    az.plot_dist(simulations[:, 1], label="Y ~ X + Z\nwrong", color="C0", ax=ax)
    plt.axvline(beta_X, color="k", linestyle="--", label=f"actual={beta_X}")
    plt.legend(loc="upper left")
    plt.xlabel("posterior mean")
    plt.ylabel("density");

Descendant explains away some of he causal effect of \(X\) on \(Y\)#

The estimated causal effect has been reduced because the descendent reduces the variation in \(Y\) that can can be explained by \(X\)

run_case_control_simulation(beta_XY=1, beta_YZ=1)
../_images/07ae40e305a694a3ee0bd6658e5e4b339ca3fe34e5caaaf9a12cf138f3db9d4b.png

Removing Descendent effect by setting \(\beta_{YZ}=0\)#

The descendant no longer has any effect here, so we should recover the same (correct) inference for both stratified and unstratified models

run_case_control_simulation(beta_XY=1, beta_YZ=0)
../_images/6c6453597e8d9bf7152ec4b0c530e8877ad8a08eeb050d438aaf7544352da991.png

Bad Ancestor (aka precision parasite)#

Now \(Z\) is a parent of \(X\)

  • no backdoor path, \(X\) is directly connected to \(Y\)

  • when stratifying by \(Z\) you’re explaining away variation in \(X\), thus reducing the amount of causal information between \(X\) and \(Y\) that can be explained otherwise.

  • Does not bias your estimate, but it reduces precision, so estimates will have more uncertainty

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("Z", "X"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
    },
    edge_props={("X", "Y"): {"color": "red", "label": "1"}, ("Z", "X"): {"label": "1"}},
    graph_direction="LR",
)
../_images/a0bf3b7889a585a0e6b5c4d5a5b7d3afdc2b81d0629e0460ebb4ebbf7a34df7f.svg

Verify via simulation#

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("Z", "X"),
    ],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
    },
    edge_props={
        ("X", "Y"): {"color": "red", "label": "1"},
        ("Y", "Z"): {"color": "red", "label": "1"},
    },
    graph_direction="LR",
)
../_images/48f1f630d8946bb7a27f6ad5ebf2ab7d9a088ddb6459aebda0f8e835d66ce3b4.svg
np.random.seed(123)


def simulate_bad_ancestor(beta_ZX=1, beta_XY=1, unobserved_variance=None, n_samples=100):
    Z = stats.norm.rvs(size=n_samples)

    mu_X = Z * beta_ZX
    if unobserved_variance is not None:
        u = stats.norm.rvs(size=n_samples) * unobserved_variance
        mu_X += u

    X = stats.norm.rvs(size=n_samples, loc=mu_X)

    mu_Y = X * beta_XY
    if unobserved_variance is not None:
        mu_Y += u

    Y = stats.norm.rvs(size=n_samples, loc=mu_Y)

    data = pd.DataFrame(np.vstack([X, Y, Z]).T, columns=["X", "Y", "Z"])

    non_stratified_model = smf.ols("Y ~ X", data=data).fit()
    stratified_model = smf.ols("Y ~ X + Z", data=data).fit()

    return non_stratified_model.params.X, stratified_model.params.X


def run_bad_ancestor_simulation(
    beta_ZX=1, beta_XY=1, n_simulations=500, unobserved_variance=None, n_samples_per_simualtion=100
):
    beta_X = beta_XY

    simulations = np.array(
        [
            simulate_bad_ancestor(
                beta_ZX=beta_ZX,
                beta_XY=beta_XY,
                unobserved_variance=unobserved_variance,
                n_samples=n_samples_per_simualtion,
            )
            for _ in range(n_simulations)
        ]
    )
    _, ax = plt.subplots(figsize=(8, 4))
    az.plot_dist(simulations[:, 0], label="Y ~ X\ncorrect", color="black", ax=ax)
    az.plot_dist(simulations[:, 1], label="Y ~ X + Z\nwrong", color="C0", ax=ax)
    plt.axvline(beta_X, color="k", linestyle="--", label=f"actual={beta_X}")
    plt.legend(loc="upper left")
    plt.xlabel("posterior mean")
    plt.ylabel("density");
run_bad_ancestor_simulation()
../_images/fd3d2e3f15437204e9986cc7c01816be6dd8fbab836474189057d83f61d7a9c5.png

Stratifying by Z doesn’t add bias (it’s centered on the correct value), but it does increase variance in estimator. This reduction in precision is proportional to the magnitude of the causal relationship between Z and X

Increasing the relationshipe between Z and X further reduces precicision#

run_bad_ancestor_simulation(beta_ZX=3)
../_images/74e2fe26894cbfc8023c190930fe665b5ddd41b17eee93d002eaecc4b8944636.png

Bias Amplification#

Stratifying on an ancestor when there are other confounders, particularly unobserved forks. This is like the Bias Parasite scenario, but it also adds bias.

utils.draw_causal_graph(
    edge_list=[("X", "Y"), ("Z", "X"), ("u", "X"), ("u", "Y")],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={("X", "Y"): {"color": "red"}, ("Y", "Z"): {"color": "red"}},
    graph_direction="LR",
)
../_images/d97e0c21d6c9deaf39338777eed6083cc62d225b5a6638965280c085192ce922.svg

Verify via simulation#

utils.draw_causal_graph(
    edge_list=[("X", "Y"), ("Z", "X"), ("u", "X"), ("u", "Y")],
    node_props={
        "X": {"color": "red"},
        "Y": {"color": "red"},
        "u": {"style": "dashed"},
        "unobserved": {"style": "dashed"},
    },
    edge_props={
        ("X", "Y"): {"color": "red", "label": "0"},
        ("Z", "X"): {"color": "red", "label": "1"},
    },
    graph_direction="LR",
)
../_images/5c6fbf999ec35457071576d0af0e7055e17f97b3211cb3d10f3205a8719a8165.svg

Run simulation with no actual causal effect, \(\beta_{XY} = 0\)#

run_bad_ancestor_simulation(beta_ZX=1, beta_XY=0, unobserved_variance=1)
../_images/457b405be764b4d09502e55f5397233d9be0255d5fac606886478b77ada380e1.png

Above we see that both estimators are biased – even in the best case, we can’t observe, and thus control for the the confound \(u\). But when stratifying by the ancestor, things are MUCH WORSE.

Hand-wavy explanation:

  • in order for \(X\) and \(Y\) to be associated, their causes need to be associated

  • by stratifying by \(Z\), we remove the amount of variation in \(X\) that is caused by \(Z\)

  • this reduction in variation in \(X\) makes the confound \(u\) more important comparatively

Disrete example of bias amplificiation#

def simulate_discrete_bias_amplifications(beta_ZX=1, beta_XY=1, n_samples=1000):
    Z = stats.bernoulli.rvs(p=0.5, size=n_samples)
    u = stats.norm.rvs(size=n_samples)

    mu_X = beta_ZX * Z + u
    X = stats.norm.rvs(loc=mu_X, size=n_samples)

    mu_Y = X * beta_XY + u
    Y = stats.norm.rvs(loc=mu_Y, size=n_samples)

    data = pd.DataFrame(np.vstack([u, Z, X, Y]).T, columns=["u", "Z", "X", "Y"])

    models = {}
    models["unstratified"] = smf.ols("Y ~ X", data=data).fit()
    models["z=0"] = smf.ols("Y ~ X", data=data[data.Z == 0]).fit()
    models["z=1"] = smf.ols("Y ~ X", data=data[data.Z == 1]).fit()

    return models, data
beta_ZX = 7
beta_ZY = 0
models, data = simulate_discrete_bias_amplifications(beta_ZX=beta_ZX, beta_XY=beta_XY)

fig, axs = plt.subplots(figsize=(5, 5))


def plot_linear_model(model, color, label):
    params = model.params
    xs = np.linspace(data.X.min(), data.X.max(), 10)
    ys = params.Intercept + params.X * xs
    utils.plot_line(xs, ys, color=color, label=label)


for z in [0, 1]:
    color = f"C{z}"
    utils.plot_scatter(data.X[data.Z == z], data.Y[data.Z == z], color=color)
    model = models[f"z={z}"]
    plot_linear_model(model, color=color, label=f"Z={z}")

model = models["unstratified"]
plot_linear_model(model, color="k", label=f"total sample")

plt.xlim([-5, 12])
plt.ylim([-5, 5])

plt.legend();
../_images/5a948df57d891f4567fc3d48ae4053e33229365b37badba860cbf6691989f9e2.png
  • When ignoring Z (the ancestor), the estimate is still somewhat biased (i.e. the black slope is not flat, as it shoudl be for \(\beta_{XY}=0\))

  • but it’s not nearly as bad as the individual slopes (blue/red) when stratifying by Z.

Review: Good & Bad Controls#

  • Confound: estimator design or sample that “confoudnds” or “confuses” our causal estiamte

  • Control: variable added to the analysis to that a causal estimate is possible

  • Adding controls can often be worse than ommitting them

  • Make assumptions explicit, and use backdoor criterion to verify those assumptions

You have to do scientific modeling to do scientific analysis

BONUS: The Table 2 Fallacy#

  • Not all coefficients represent causal effects, particularly those in the adjustment set

  • Those that are causal effects tend to be partial effects, not total causal effects.

  • Table 2 actively encourage misinterpretation

  • As mentioned multiple times: Need different estimators for addressing different causal effects.

Example: Smoking, Age, HIV, and Stroke#

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("S", "Y"),
        ("S", "X"),
        ("A", "Y"),
        ("A", "X"),
        ("A", "S"),
    ],
    node_props={
        "X": {"label": "X, HIV", "color": "red"},
        "Y": {"label": "Y, Stroke", "color": "red"},
        "S": {"label": "S, Smoking"},
        "A": {"label": "A, Age"},
    },
    edge_props={("X", "Y"): {"color": "red"}},
    graph_direction="LR",
)
../_images/7ed58d248152d61597956ea87dd05959431f07cd007c042dea01994b313640fc.svg

Identify paths via backdoor criterion#

  • \(X \rightarrow Y\) (front door)

  • \(X \leftarrow S \rightarrow Y\) (backdoor, fork)

  • \(X \leftarrow A \rightarrow Y\) (backdoor, fork)

  • \(X \leftarrow A \rightarrow S \rightarrow Y\) (backdoor, fork and pipe)

Adjustment set is \(\{S, A\}\)

Conditional Statistical Model#

\[\begin{split} \begin{align*} Y_i &\sim \mathcal{N}(\mu_i, \sigma) \\ \mu_i = &\alpha + \beta_X X_i + \beta_S S_i + \beta_A A_i \end{align*} \end{split}\]

Looking at the model “from the perspective of various variables”#

From perspective of X#

Conditioning on \(A\) and \(S\) essentially removes the arrows going into \(X\), \(\beta_X\) giving us the direct effect of \(X\) on \(Y\)

  • we’ve removed all backdoor paths by stratifying by \(S\) and \(A\) throught the coefficients \(\beta_S, \beta_A\)

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("S", "Y"),
        ("S", "X"),
        ("A", "Y"),
        ("A", "X"),
        ("A", "S"),
    ],
    node_props={"X": {"color": "red"}, "Y": {"color": "red"}},
    edge_props={
        ("X", "Y"): {"color": "red"},
        ("A", "X"): {"color": "none"},
        ("S", "X"): {"color": "none"},
    },
    graph_direction="LR",
)
../_images/8cd290a50a5c17fce82138385183a4840fc2baee1b515ff5c36f78ffbdecc5e7.svg

From perspective of S#

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("S", "Y"),
        ("S", "X"),
        ("A", "Y"),
        ("A", "X"),
        ("A", "S"),
    ],
    node_props={"Y": {"color": "red"}, "S": {"color": "red"}},
    edge_props={
        ("X", "Y"): {"color": "red"},
        ("S", "X"): {"color": "red"},
        ("S", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/99b1ea190988cd775319a5e74aa0e7448a6e7755883ea33dfe18ab471646e3b0.svg
Adjusted graph with the full model#
utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("S", "Y"),
        ("S", "X"),
        ("A", "Y"),
        ("A", "X"),
        ("A", "S"),
    ],
    node_props={"Y": {"color": "red"}, "S": {"color": "red"}},
    edge_props={
        ("S", "X"): {"color": "none"},
        ("A", "S"): {"color": "none"},
        ("S", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/f4d6e61c5356d3d437ad29e5e5149ee0c7c6ce22cd8e79138e2953293a42a798.svg
  • In the unconditional model, the effect of \(S\) on \(Y\) is confounded by \(A\), because it’s a common cause of \(S\) and \(Y\) (and \(X\))

  • Conditioning on \(A\) & \(X\) (via the same statistical model above), \(\beta_{S}\) gives the direct effect of \(S\) on \(Y\)/

    • Since we’ve blocked the path along \(X\) in the linear regression, we no longer get the total effect.

From the perspective of A#

In the unconditional model, the total causal effect of \(A\) on \(Y\) flows through all paths:

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("S", "Y"),
        ("S", "X"),
        ("A", "Y"),
        ("A", "X"),
        ("A", "S"),
    ],
    node_props={"Y": {"color": "red"}, "A": {"color": "red"}},
    edge_props={
        ("X", "Y"): {"color": "red"},
        ("A", "X"): {"color": "red"},
        ("A", "Y"): {"color": "red"},
        ("A", "S"): {"color": "red"},
        ("S", "X"): {"color": "red"},
        ("S", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/7447fb79d3e782924e0a0039fb0533fa382149121a8dad5cc392985174bbf896.svg
  • Conditioning on \(S\) & \(X\) (via the same statistical model above), \(\beta_{S}\) gives the direct effect of \(S\) on \(Y\)/

    • Since we’ve blocked the path along \(X\) in the linear regression, we no longer get the total effect.

utils.draw_causal_graph(
    edge_list=[
        ("X", "Y"),
        ("S", "Y"),
        ("S", "X"),
        ("A", "Y"),
        ("A", "X"),
        ("A", "S"),
    ],
    node_props={"Y": {"color": "red"}, "A": {"color": "red"}},
    edge_props={
        ("A", "X"): {"color": "none"},
        ("A", "S"): {"color": "none"},
        ("A", "Y"): {"color": "red"},
    },
    graph_direction="LR",
)
../_images/5dc9fb3952745c7e0c4a270ad9817a749ea00589ef9bdd36ca513095cc4c5bbf.svg

This gets trickier if we consider unobserved confounds on variables!

Summary: Table 2 Fallacy#

  • Not all coefficients have the same interpretation

    • different estimands require different models

  • Do not present coefficients as they are equal (i.e. in Table 2)

  • …or, don’t present coefficients at all, instead push out predictions.

  • Provide explicit interpretation of each

Authors#

  • Ported to PyMC by Dustin Stansbury (2024)

  • Based on Statistical Rethinking (2023) lectures by Richard McElreath

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray
Last updated: Tue Dec 17 2024

Python implementation: CPython
Python version       : 3.12.5
IPython version      : 8.27.0

pytensor: 2.26.4
aeppl   : not installed
xarray  : 2024.7.0

pymc       : 5.19.1
scipy      : 1.14.1
statsmodels: 0.14.2
xarray     : 2024.7.0
matplotlib : 3.9.2
pandas     : 2.2.2
numpy      : 1.26.4
arviz      : 0.19.0

Watermark: 2.5.0

License notice#

All the notebooks in this example gallery are provided under the MIT License which allows modification, and redistribution for any use provided the copyright and license notices are preserved.

Citing PyMC examples#

To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.

Important

Many notebooks are adapted from other sources: blogs, books… In such cases you should cite the original source as well.

Also remember to cite the relevant libraries used by your code.

Here is an citation template in bibtex:

@incollection{citekey,
  author    = "<notebook authors, see above>",
  title     = "<notebook title>",
  editor    = "PyMC Team",
  booktitle = "PyMC examples",
  doi       = "10.5281/zenodo.5654871"
}

which once rendered could look like: