Modeling Events#

This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.

Video - Lecture 09 - Modeling Events# Lecture 09 - Modeling Events

# Ignore warnings
import warnings

import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import statsmodels.formula.api as smf
import utils as utils
import xarray as xr

from matplotlib import pyplot as plt
from matplotlib import style
from scipy import stats as stats

warnings.filterwarnings("ignore")

# Set matplotlib style
STYLE = "statistical-rethinking-2023.mplstyle"
style.use(STYLE)

UC Berkeley Admissions#

Dataset#

4562 Graduate student applications
Stratified by
- Department
- Gender

Goal is to identify gender discrimination by admissions officers

ADMISSIONS = utils.load_data("UCBadmit")
ADMISSIONS

	dept	applicant.gender	admit	reject	applications
1	A	male	512	313	825
2	A	female	89	19	108
3	B	male	353	207	560
4	B	female	17	8	25
5	C	male	120	205	325
6	C	female	202	391	593
7	D	male	138	279	417
8	D	female	131	244	375
9	E	male	53	138	191
10	E	female	94	299	393
11	F	male	22	351	373
12	F	female	24	317	341

Modeling Events#

Events: discrete, unordered outcomes
Observations are counts/aggregates
Unknowns are probabilities \(p\) of event ocurring, or odds of those probabilities \(\frac{p}{1-p}\)
All things we stratify by interact – generally never independent in real life
Often deal with the Log Odds of \(p = \log \left (\frac{p}{1-p} \right)\)

Admissions: Drawing the owl 🦉#

Estimands(s)
Scientific Models(s)
Statistical Models(s)
Analysis

1. Estimand(s)#

Which path defines “discrimination”#

Direct Discrimination (Causal Effect)#

utils.draw_causal_graph(
    edge_list=[("G", "D"), ("G", "A"), ("D", "A")],
    node_props={
        "G": {"label": "Gender, G", "color": "red"},
        "D": {"label": "Department, D"},
        "A": {"label": "Admission rate, A", "color": "red"},
    },
    edge_props={("G", "A"): {"color": "red"}},
    graph_direction="LR",
)

../_images/36481e2592bbf6a5057aa0e742f873f7ec7751037be4a25d61ac4718e313d0ae.svg

aka “Institutional discrimination”
Referees are biased for or against a particular group
Usually the type we’re interested in identifying if it exists
Requires strong causal assumptions

Here, deparment, D is a mediator – this is a common structure in social sciences, where categorical status (e.g. gender) effects some mediating context (e.g. occupation), both of which affect a target outcome (wage). Examples

wage discrimination
hiring
scientific awards

Indirect Discrimination#

utils.draw_causal_graph(
    edge_list=[("G", "D"), ("G", "A"), ("D", "A")],
    node_props={
        "G": {"label": "Gender, G", "color": "red"},
        "D": {"label": "Department, D"},
        "A": {"label": "Admission rate, A", "color": "red"},
    },
    edge_props={("G", "D"): {"color": "red"}, ("D", "A"): {"color": "red"}},
    graph_direction="LR",
)

../_images/1e4536009a97476089b1ae85736f832899d4fbe78d41aa172c087b3543e6a5f1.svg

aka “Structural discrimination”
e.g. Gender affects a person’s interests, and therefore the department they will apply to
Affects overall admission rates
Requires strong causal assumptions

Total Discrimination#

utils.draw_causal_graph(
    edge_list=[("G", "D"), ("G", "A"), ("D", "A")],
    node_props={
        "G": {"label": "Gender, G", "color": "red"},
        "D": {"label": "Department, D"},
        "A": {"label": "Admission rate, A", "color": "red"},
    },
    edge_props={
        ("G", "D"): {"color": "red"},
        ("G", "A"): {"color": "red"},
        ("D", "A"): {"color": "red"},
    },
    graph_direction="LR",
)

../_images/9c7a23caa46f1b7648693b51fb9b7f4716a3c5310ac53daac5110781eb8d7bd8.svg

aka “Experienced discrimination”
through both direct and indirect routes
Requires mild assumptions

Estimands & Estimators#

Each of the different estimands require a different estimators
Often the thing we can estimate often isn’t what we want to estimate
e.g. Total discrimination may be easier to estimate, but is less actionable

Unobserved Confounds#

utils.draw_causal_graph(
    edge_list=[("G", "D"), ("G", "A"), ("D", "A"), ("U", "D"), ("U", "A")],
    node_props={"U": {"style": "dashed"}, "unobserved": {"style": "dashed"}},
    graph_direction="LR",
)

../_images/7fe9702ac0f014ba3f018166f8bc709d597ee153e0631977f57ab656b057fca6.svg

It’s always possible there are also confounds between the mediator and some unobserved confounds. We will ignore these for now.

2. Scientific Model(s):#

Simulate the process#

Below is a generative model of the review/admission process

np.random.seed(123)
n_samples = 1000

GENDER = stats.bernoulli.rvs(p=0.5, size=n_samples)

# Groundtruth parameters
# Gender 1 tends to apply to department 1
P_DEPARTMENT = np.where(GENDER == 0, 0.3, 0.8)

# Acceptance rates matrices -- Department x Gender
UNBIASED_ACCEPTANCE_RATES = np.array([[0.1, 0.1], [0.3, 0.3]])  # No *direct* gender bias

# Biased acceptance:
# - dept 0 accepts gender 0 at 50% of unbiased rate
# - dept 1 accepts gender 0 at 67% of unbiased rate
BIASED_ACCEPTANCE_RATES = np.array([[0.05, 0.1], [0.2, 0.3]])  # *direct* gender bias present

DEPARTMENT = stats.bernoulli.rvs(p=P_DEPARTMENT, size=n_samples).astype(int)


def simulate_admissions_data(bias_type):
    """Simulate admissions data using the global params above"""
    acceptance_rate = (
        BIASED_ACCEPTANCE_RATES if bias_type == "biased" else UNBIASED_ACCEPTANCE_RATES
    )
    acceptance = stats.bernoulli.rvs(p=acceptance_rate[DEPARTMENT, GENDER])

    return (
        pd.DataFrame(
            np.vstack([GENDER, DEPARTMENT, acceptance]).T,
            columns=["gender", "department", "accepted"],
        ),
        acceptance_rate,
    )


for bias_type in ["unbiased", "biased"]:

    fake_admissions, acceptance_rate = simulate_admissions_data(bias_type)

    gender_acceptance_counts = fake_admissions.groupby(["gender", "accepted"]).count()["department"]
    gender_acceptance_counts.name = None

    gender_department_counts = fake_admissions.groupby(["gender", "department"]).count()["accepted"]
    gender_department_counts.name = None

    observed_acceptance_rates = fake_admissions.groupby("gender").mean()["accepted"]
    observed_acceptance_rates.name = None

    print()
    print("-" * 30)
    print(bias_type.upper())
    print("-" * 30)
    print(f"Department Acceptance rate:\n{acceptance_rate}")
    print(f"\nGender-Department Frequency:\n{gender_department_counts}")
    print(f"\nGender-Acceptance Frequency:\n{gender_acceptance_counts}")
    print(f"\nOverall Acceptance Rates:\n{observed_acceptance_rates}")

------------------------------
UNBIASED
------------------------------
Department Acceptance rate:
[[0.1 0.1]
 [0.3 0.3]]

Gender-Department Frequency:
gender  department
0       0             335
        1             173
1       0             106
        1             386
dtype: int64

Gender-Acceptance Frequency:
gender  accepted
0       0           437
        1            71
1       0           359
        1           133
dtype: int64

Overall Acceptance Rates:
gender
0    0.139764
1    0.270325
dtype: float64

------------------------------
BIASED
------------------------------
Department Acceptance rate:
[[0.05 0.1 ]
 [0.2  0.3 ]]

Gender-Department Frequency:
gender  department
0       0             335
        1             173
1       0             106
        1             386
dtype: int64

Gender-Acceptance Frequency:
gender  accepted
0       0           465
        1            43
1       0           370
        1           122
dtype: int64

Overall Acceptance Rates:
gender
0    0.084646
1    0.247967
dtype: float64

Simulated data properties#

Both scenarios#

Gender 0 tends to apply to department 0
Gender 1 tends to apply to department 1

Unbiased scenario:#

due to lower acceptance rates in dept 0 and tendency of gender 0 to apply to dept 0, gender 0 has a lower overall rejection rate compared to gender 1
due to higher acceptance rates in dept 1 and tendency of gender 1 to apply to dept 1, gender 1 has a higher overall acceptance rate compared to gender 0
even in the case of no (direct) gender discrimination, there is still indirect discrimination based on tendency of genders to apply to different departments, and the unqual likelihood that each department accepts students.

Biased scenario#

overall acceptance rates are lower (due to baseline reduction in gender 0 acceptance rates across both departments)
in the scenario where there is actual department bias, we see a similar overall pattern of discrimination as the unbiased case due to the indirect effect.

3. Statistical Models#

Modeling Events#

We observe counts of events
We estimate probabilities – or, rather, the log-odds of events ocurring

Linear Models#

Expected value is linear (additive) combination of parameters

\[\begin{split} \begin{align*} Y_i &\sim \mathcal{N}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_X X_i + ... \end{align*} \end{split}\]

b.c. Normal distribution is unbounded, so too is the expected value of the linear model.

Event Models#

Discrete events either occur, taking the value 1, or they do not, taking the value 0. This puts bounds on the expected value of an event. Namely the bounds are on the interval \((0, 1)\)

Generalized Linear Models & Link Functions#

Expected value is some function \(f(\mathbb{E}[\theta])\) of an additive combination of the parameters.

\[ f(\mathbb{E}[\theta]) = \alpha + \beta_X X_i + ... \]

Generally, this function \(f()\), called the link function, will have a specific form that is associated with the likelihood distribution.
the link function will have an inverse link function \( f^{-1}(X_i, \theta)\) such that
to reiterate link functions are matched to distributions

\[ \mathbb{E}[\theta] = f^{-1}(\alpha + \beta_X X_i + ...) \]

In the linear regression case the likelihood model is Gaussian, and the associated link function (and inverse link function) is just the identity, \(I()\). (left plot below)

_, axs = plt.subplots(1, 2, figsize=(8, 4))
xs = np.linspace(-3, 3, 100)
plt.sca(axs[0])
alpha = 0
beta = 0.75

plt.plot(xs, alpha + beta * xs)
plt.axhline([0], linestyle="--", color="gray")
plt.axhline([1], linestyle="--", color="gray")
plt.xlim([-3, 3])
plt.ylim([-0.5, 1.5])
plt.xlabel("x")
plt.ylabel("$I(\\alpha + \\beta X$)")
plt.title("Linear Model\n(continous outcomes)")

plt.sca(axs[1])
plt.plot(xs, utils.invlogit(alpha + beta * xs))
plt.axhline([0], linestyle="--", color="gray")
plt.axhline([1], linestyle="--", color="gray")
plt.xlim([-3, 3])
plt.ylim([-0.5, 1.5])
plt.xlabel("x")
plt.ylabel("$\\frac{1}{e^{-(\\alpha + \\beta X)}}$")
plt.title("Logistic Model\n(binary outcomes)");

../_images/4c3932a274a90a5f1bca896885594cb381327e8e912f984b862de2c8288d6dca.png

Logit Link Function and Logisitic Inverse function#

Binary outcomes#

Logistic regression used to model the probability of an event ocurring
Likelihood function is the Bernouilli
Associated link function is the log-odds, or logit: \(f(p) = \frac{p}{1-p}\)

\[\begin{split} \begin{align*} Y_i &\sim \text{Bernoulli}(p_i) \\ f(p_i) &= \alpha + \beta_X X_i + ... \end{align*} \end{split}\]

the inverse link function is the inverse logit aka logistic function (right plot above): \(f^{-1}(X_i, \theta) = \frac{1}{1 + e^{-(\alpha + \beta_X X_i + ...)}}\)
- defining \(\alpha + \beta_X X + ... = q\) (ignoring the data index \(i\) for now), the derivation of the inverse logit is as follows
  \[\begin{split} \begin{align} \log \left(\frac{p}{1-p}\right) &= \alpha + \beta_X X + ... = q \\ \frac{p}{1-p} &= e^{q} \\ p &= (1-p) e^{q} \\ p + p e^{q} &= e^{q} \\ p(1 + e^{q}) &= e^{q} \\ p &= \frac{ e^{q}}{(1 + e^{q})} \\ p &= \frac{ e^{q}}{(1 + e^{q})} \frac{e^{-q}}{e^{-q}}\\ p &= \frac{1}{(1 + e^{-q})} \\ \end{align} \end{split}\]

logit is a harsh transform#

Interpreting the log odds can be difficult at first, but in time becomes easier

log-odds scale
\(\text{logit}(p)=0, p=0.5\)
\(\text{logit}(p)=-\infty, p=0\)
- rule of thumb: \(\text{logit}(p)<-4\) means event is unlikely (hardly ever)
\(\text{logit}(p)=\infty, p=1\)
- rule of thumb: \(\text{logit}(p)>4\) means event is very likely (nearly always)

log_odds = np.linspace(-4, 4, 100)
utils.plot_line(log_odds, utils.invlogit(log_odds), label=None)
plt.axhline(0.5, linestyle="--", color="k", label="p=0.5")
plt.xlabel("$logit(p)$")
plt.ylabel("$p$")
plt.legend();

../_images/07190393a5d1c1bc2e28ee8995cd08187bddf1f818f28809aa31e9b34723f4ce.png

Bayesian Updating for Logistic Regression#

For the following simulation, we’ll use a custom utility function utils.simulate_2_parameter_bayesian_learning for simulating general Bayeisan posterior update simulation. Here’s the API for that function (for more details see utils.py)

help(utils.simulate_2_parameter_bayesian_learning_grid_approximation)

Help on function simulate_2_parameter_bayesian_learning_grid_approximation in module utils:

simulate_2_parameter_bayesian_learning_grid_approximation(x_obs, y_obs, param_a_grid, param_b_grid, true_param_a, true_param_b, model_func, posterior_func, n_posterior_samples=3, param_labels=None, data_range_x=None, data_range_y=None)
    General function for simulating Bayesian learning in a 2-parameter model
    using grid approximation.

    Parameters
    ----------
    x_obs : np.ndarray
        The observed x values
    y_obs : np.ndarray
        The observed y values
    param_a_grid: np.ndarray
        The range of values the first model parameter in the model can take.
        Note: should have same length as param_b_grid.
    param_b_grid: np.ndarray
        The range of values the second model parameter in the model can take.
        Note: should have same length as param_a_grid.
    true_param_a: float
        The true value of the first model parameter, used for visualizing ground
        truth
    true_param_b: float
        The true value of the second model parameter, used for visualizing ground
        truth
    model_func: Callable
        A function `f` of the form `f(x, param_a, param_b)`. Evaluates the model
        given at data points x, given the current state of parameters, `param_a`
        and `param_b`. Returns a scalar output for the `y` associated with input
        `x`.
    posterior_func: Callable
        A function `f` of the form `f(x_obs, y_obs, param_grid_a, param_grid_b)
        that returns the posterior probability given the observed data and the
        range of parameters defined by `param_grid_a` and `param_grid_b`.
    n_posterior_samples: int
        The number of model functions sampled from the 2D posterior
    param_labels: Optional[list[str, str]]
        For visualization, the names of `param_a` and `param_b`, respectively
    data_range_x: Optional len-2 float sequence
        For visualization, the upper and lower bounds of the domain used for model
        evaluation
    data_range_y: Optional len-2 float sequence
        For visualization, the upper and lower bounds of the range used for model
        evaluation.

# Model function required for simulation


def logistic_model(x, alpha, beta):
    return utils.invlogit(alpha + beta * x)


# Posterior function required for simulation
def logistic_regression_posterior(x_obs, y_obs, alpha_grid, beta_grid, likelihood_prior_std=0.01):

    # Convert params to 1-d arrays
    if np.ndim(alpha_grid) > 1:
        alpha_grid = alpha_grid.ravel()

    if np.ndim(beta_grid):
        beta_grid = beta_grid.ravel()

    log_prior_intercept = stats.norm(0, 1).logpdf(alpha_grid)
    log_prior_slope = stats.norm(0, 1).logpdf(beta_grid)

    log_likelihood = np.array(
        [
            stats.bernoulli(p=utils.invlogit(a + b * x_obs)).logpmf(y_obs)
            for a, b in zip(alpha_grid, beta_grid)
        ]
    ).sum(axis=1)

    # Posterior is equal to the product of likelihood and priors (here a sum in log scale)
    log_posterior = log_likelihood + log_prior_intercept + log_prior_slope

    # Convert back to natural scale
    return np.exp(log_posterior - log_posterior.max())


# Generate data for demo
np.random.seed(123)
RESOLUTION = 100
N_DATA_POINTS = 128

# Ground truth parameters
ALPHA = -1
BETA = 2

x = stats.norm(0, 1).rvs(size=N_DATA_POINTS)
p_y = utils.invlogit(ALPHA + BETA * x)
y = stats.bernoulli.rvs(p=p_y)

alpha_grid = np.linspace(-3, 3, RESOLUTION)
beta_grid = np.linspace(-3, 3, RESOLUTION)

# Vary the sample size to show how the posterior adapts to more and more data
for n_samples in [0, 2, 4, 8, 16, 32, 64, 128]:
    # Run the simulation
    utils.simulate_2_parameter_bayesian_learning_grid_approximation(
        x_obs=x[:n_samples],
        y_obs=y[:n_samples],
        param_a_grid=alpha_grid,
        param_b_grid=beta_grid,
        true_param_a=ALPHA,
        true_param_b=BETA,
        model_func=logistic_model,
        posterior_func=logistic_regression_posterior,
        param_labels=["$\\alpha$", "$\\beta$"],
        data_range_x=(-2, 2),
        data_range_y=(-0.05, 1.05),
    )

../_images/c111ea0595b05bb371410b7bc46329fb3b8a8ae6daedb82acf0e51045bfe54d8.png

../_images/958178593f35d3cc9d1f6babe27fc2eac66a3867dcc2f3f7385c6dd7ea635c9a.png

../_images/6896323dd46e911aaa8b8b3a2a52dcfee00d40d405d7f438022d4c6c3d5990a4.png

../_images/ec16f9c855ac97c8dcc83db90b2394c7a65425ee273da84cc0eff015f47393e9.png

../_images/e30ac4b35a98d71b9710d8d5159253f6b8f979c57294c12bb67d54bb8ebbec2a.png

../_images/57f99c1f25b35f3d1db007bce1b561b38cc22dcd12992ad40894c2a27a16510d.png

../_images/29d08583d867e98889cf47576270e2ca4b539eddb776daeb8d3ada910e2df6f7.png

../_images/9e60ba52f3453a23bf1662b9cadabe6179e0274c8fb1b62d5b53b5b4f7abfb40.png

Priors for logistic regression#

logit link function is a harsh transform

Logit compresses parameter distributions
\(x > +4 \rightarrow \) event basically always occurs
\(x < -4 \rightarrow\) event basically never occurs

n_samples = 10000
fig, axs = plt.subplots(3, 2, figsize=(8, 8))
for ii, std in enumerate([10, 1.5, 1]):

    # Alpha prior distribution
    alphas = stats.norm(0, std).rvs(n_samples)
    plt.sca(axs[ii][0])
    az.plot_dist(alphas, color="C1")
    plt.xlim([-30, 30])
    plt.ylabel("density")
    if ii == 2:
        plt.xlabel("alpha")

    # Resulting event probability distribution
    ps = utils.invlogit(alphas)
    plt.sca(axs[ii][1])
    az.plot_dist(ps, color="C0")
    plt.xlim([0, 1])
    plt.ylabel("density")
    plt.title(f"$\\alpha \\sim \\mathcal{{N}}(0, {std})$")
    if ii == 2:
        plt.xlabel("p(event)")

../_images/89d0718aba6a7e06f1a8c1c8c4887c77c6e88c11846aa5859e6c7ae34c94f611.png

Prior Predictive Simulations#

# Demonstrating the effect of alpha / beta on p(x)
from functools import partial


def gaussian_2d_pdf(xs, ys, mean=(0, 0), covariance=np.eye(2)):
    return np.array(stats.multivariate_normal(mean, covariance).pdf(np.vstack([xs, ys]).T))


pdf = partial(gaussian_2d_pdf)

xs = ys = np.linspace(-3, 3, 100)
# utils.plot_2d_function(xs, ys, pdf, cmap='gray_r')


def plot_logistic_prior_predictive(n_samples=20, prior_std=0.5):

    _, axs = plt.subplots(1, 2, figsize=(10, 4))
    plt.sca(axs[0])
    min_x, max_x = -prior_std, prior_std
    xs = np.linspace(min_x, max_x, 100)
    ys = xs

    pdf = partial(gaussian_2d_pdf, covariance=np.array([[prior_std**2, 0], [0, prior_std**2]]))
    utils.plot_2d_function(xs, ys, pdf, cmap="gray_r")

    plt.axis("square")
    plt.xlim([min_x, max_x])
    plt.ylim([min_x, max_x])

    # Sample some parameters from prior
    βs = stats.norm.rvs(0, prior_std, size=n_samples)
    αs = stats.norm.rvs(0, prior_std, size=n_samples)

    for b, a in zip(βs, αs):
        plt.scatter(a, b)

    plt.xlabel("$\\alpha$")
    plt.ylabel("$\\beta$")
    plt.title(f"Samples from prior with std={prior_std}")

    # Resulting sigmoid functions
    plt.sca(axs[1])
    min_x, max_x = -3, 3
    xs = np.linspace(min_x, max_x, 100)
    for a, b in zip(αs, βs):
        logit_p = a + xs * b
        p = utils.invlogit(logit_p)
        plt.plot(xs, p)

    plt.xlabel("x")
    plt.ylabel("p = invlogit(x)")
    plt.xlim([-3, 3])
    plt.ylim([-0.05, 1.05])
    plt.axhline(0.5, linestyle="--", color="k")
    plt.title(f"Resulting Logistic Models")

plot_logistic_prior_predictive(prior_std=0.5)

../_images/acd14733c244c8b5e43958bc9c24b112559f2db5baa06bb008ec0d89b99ed116.png

plot_logistic_prior_predictive(prior_std=1.0)

../_images/98fbf3ff89e2c9255cb73c7c6cfc376dcb45e43038b88a1db2931b64be25f1d6.png

plot_logistic_prior_predictive(prior_std=10)

../_images/15f7fccb522dcca48bf0a4082e8a0fbfbeb3e42dbfe7de9dd406636c103a62de.png

Statistical models for admissions#

Again, the estimator will depend on the estimand

Total Causal Effect#

utils.draw_causal_graph(
    edge_list=[("G", "D"), ("G", "A"), ("D", "A")],
    node_props={
        "G": {"color": "red"},
        "A": {"color": "red"},
    },
    edge_props={
        ("G", "A"): {"color": "red"},
        ("D", "A"): {"color": "red"},
        ("G", "D"): {"color": "red"},
    },
    graph_direction="LR",
)

../_images/75c2b72eec68ed3513e34cd78749c8d664e23266b22fe92d3fb83676b0c40b33.svg

Stratify by only Gender. Don’t stratify by Department b.c. it’s a Pipe (mediator) that we do not want to block

\[\begin{split} \begin{align*} A_i &\sim \text{Bernoulli}(p=p_i) \\ \text{logit}(p_i) &= \alpha[G_i] \\ \alpha &= [\alpha_0, \alpha_1] \\ \alpha_j &\sim \text{Normal}(0, 1) \end{align*} \end{split}\]

Direct Causal Effect#

utils.draw_causal_graph(
    edge_list=[("G", "D"), ("G", "A"), ("D", "A")],
    node_props={
        "G": {"color": "red"},
        "A": {"color": "red"},
    },
    edge_props={("G", "A"): {"color": "red"}},
    graph_direction="LR",
)

../_images/c24d42205a8c20380c913fa67ffafc5cd530c08b829324849d696ce9454db55d.svg

Stratify by Gender and Department to block the Pipe

\[\begin{split} \begin{align*} A_i &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha[G_i, D_i] \\ \alpha &= \left[{\begin{array}{cc} \alpha_{0,0}, \alpha_{0,1} \\ \alpha_{1,0}, \alpha_{1,1} \end{array}}\right] \\ \alpha_{j,k} &\sim \text{Normal}(0, 1) \end{align*} \end{split}\]

Fitting Total Causal Effect Model#

def fit_total_effect_model_admissions(data):
    """Fit total effect model, stratifying by gender.

    Note
    ----
    We use the Binomial regression, so simulated observation-level
    data from above must be pre-aggregated. (see the function
    `aggregate_admissions_data` below). We could have
    used a Bernoulli likelihood (Logistic regression) for the simulated
    data, but given that we only have aggregates for the real
    UC Berkeley data we choose this more general formulation.
    """

    # Set up observations / coords
    n_admitted = data["admit"].values
    n_applications = data["applications"].values

    gender_coded, gender = pd.factorize(data["applicant.gender"].values)

    with pm.Model(coords={"gender": gender}) as total_effect_model:

        # Mutable data for running any gender-based counterfactuals
        gender_coded = pm.Data("gender", gender_coded, dims="obs_ids")

        alpha = pm.Normal("alpha", 0, 1, dims="gender")

        # Record the inverse linked probabilities for reporting
        pm.Deterministic("p_accept", pm.math.invlogit(alpha), dims="gender")

        p = pm.math.invlogit(alpha[gender_coded])

        likelihood = pm.Binomial(
            "accepted", n=n_applications, p=p, observed=n_admitted, dims="obs_ids"
        )
        total_effect_inference = pm.sample()

    return total_effect_model, total_effect_inference


def aggregate_admissions_data(raw_admissions_data):
    """Aggregate simulated data from `simulate_admissions_data`, which
    is in long format to short format, by so that it can be modeled
    using Binomial likelihood. We also recode column names to have the
    same fields as the UC Berkeley dataset so that we can use the
    same model-fitting functions.
    """

    # Aggregate applications, admissions, and rejections. Rename
    # columns to have the  same fields as UC Berkeley dataset
    applications = raw_admissions_data.groupby(["gender", "department"]).count().reset_index()
    applications.rename(columns={"accepted": "applications"}, inplace=True)

    admits = raw_admissions_data.groupby(["gender", "department"]).sum().reset_index()
    admits.rename(columns={"accepted": "admit"}, inplace=True)

    data = applications.merge(admits)
    data.loc[:, "reject"] = data.applications - data.admit

    # Code gender & department. For easier comparison to lecture,
    # we use 1-indexed gender/department indicators like McElreath
    data.loc[:, "applicant.gender"] = data.gender.apply(lambda x: 2 if x else 1)
    data.loc[:, "dept"] = data.department.apply(lambda x: 2 if x else 1)

    return data[["applicant.gender", "dept", "applications", "admit", "reject"]]


def plot_admissions_model_posterior(inference, figsize=(6, 2)):
    _, ax = plt.subplots(figsize=figsize)

    # Plot alphas
    az.plot_forest(inference, var_names=["alpha"], combined=True, hdi_prob=0.89, ax=ax)

    # Plot inver-linked acceptance probabilities
    _, ax = plt.subplots(figsize=figsize)
    az.plot_forest(inference, var_names=["p_accept"], combined=True, hdi_prob=0.89, ax=ax)

    return az.summary(inference)

Fit Total Effect model on BIASED simulated admissions#

SIMULATED_BIASED_ADMISSIONS, _ = simulate_admissions_data("biased")
SIMULATED_BIASED_ADMISSIONS = aggregate_admissions_data(SIMULATED_BIASED_ADMISSIONS)
SIMULATED_BIASED_ADMISSIONS

	applicant.gender	dept	applications	admit	reject
0	1	1	335	13	322
1	1	2	173	34	139
2	2	1	106	11	95
3	2	2	386	108	278

total_effect_model_simulated_biased, total_effect_inference_simulated_biased = (
    fit_total_effect_model_admissions(SIMULATED_BIASED_ADMISSIONS)
)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

plot_admissions_model_posterior(total_effect_inference_simulated_biased, (6, 1.5))

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[1]	-2.240	0.151	-2.531	-1.958	0.002	0.002	3876.0	2626.0	1.0
alpha[2]	-1.133	0.106	-1.335	-0.939	0.002	0.001	4108.0	3100.0	1.0
p_accept[1]	0.097	0.013	0.071	0.121	0.000	0.000	3876.0	2626.0	1.0
p_accept[2]	0.244	0.019	0.208	0.281	0.000	0.000	4108.0	3100.0	1.0

../_images/3c9cb7ead48aa4b6c88d0daa4b7b9c2679778c121ee9a9976dfe7c24fcccc2d3.png

../_images/a9e2d4919a9a882f6fd19fb886850176e6693db3b446d604c19209db646fdd06.png

Fit Total Effect model on UNBIASED simulated admissions#

SIMULATED_UNBIASED_ADMISSIONS, _ = simulate_admissions_data("unbiased")
SIMULATED_UNBIASED_ADMISSIONS = aggregate_admissions_data(SIMULATED_UNBIASED_ADMISSIONS)
SIMULATED_UNBIASED_ADMISSIONS

	applicant.gender	dept	applications	admit	reject
0	1	1	335	33	302
1	1	2	173	60	113
2	2	1	106	8	98
3	2	2	386	111	275

total_effect_model_simulated_unbiased, total_effect_inference_simulated_unbiased = (
    fit_total_effect_model_admissions(SIMULATED_UNBIASED_ADMISSIONS)
)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

plot_admissions_model_posterior(total_effect_inference_simulated_unbiased, (6, 1.5))

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[1]	-1.481	0.112	-1.694	-1.264	0.002	0.001	3700.0	2457.0	1.0
alpha[2]	-1.132	0.107	-1.327	-0.937	0.002	0.001	4389.0	2864.0	1.0
p_accept[1]	0.186	0.017	0.150	0.215	0.000	0.000	3700.0	2457.0	1.0
p_accept[2]	0.244	0.020	0.210	0.281	0.000	0.000	4389.0	2864.0	1.0

../_images/279bf2f460bf6f9e76966003da9469b5cbbae6ae7193e0e9898eba1964f6ca89.png

../_images/1007e213a0dd112dfb22b6a9f0dabdf83e0613e63ef5aadefc05ce1b5b843b83.png

Fitting Direct Causal Effect Model#

def fit_direct_effect_model_admissions(data):
    """Fit total effect model, stratifying by gender.

    Note
    ----
    We use the Binomial likelihood, so simulated observation-level data from
    above must be pre-aggregated. (see the function `aggregate_admissions_data`
    below). We could have used a Bernoulli likelihood for the simulated data,
    but given that we only have aggregates for the real UC Berkeley data we
    choose this more general formulation.
    """

    # Set up observations / coords
    n_admitted = data["admit"].values
    n_applications = data["applications"].values

    department_coded, department = pd.factorize(data["dept"].values)
    gender_coded, gender = pd.factorize(data["applicant.gender"].values)

    with pm.Model(coords={"gender": gender, "department": department}) as direct_effect_model:

        # Mutable data for running any gender-based or department-based counterfactuals
        gender_coded = pm.Data("gender", gender_coded, dims="obs_ids")
        department_coded = pm.Data("department", department_coded, dims="obs_ids")
        n_applications = pm.Data("n_applications", n_applications, dims="obs_ids")

        alpha = pm.Normal("alpha", 0, 1, dims=["department", "gender"])

        # Record inverse linked probabilities for reporting
        pm.Deterministic("p_accept", pm.math.invlogit(alpha), dims=["department", "gender"])

        p = pm.math.invlogit(alpha[department_coded, gender_coded])

        likelihood = pm.Binomial(
            "accepted", n=n_applications, p=p, observed=n_admitted, dims="obs_ids"
        )
        direct_effect_inference = pm.sample()

    return direct_effect_model, direct_effect_inference

Fit Direct Effect model to BIASED simulated admissions data#

direct_effect_model_simulated_biased, direct_effect_inference_simulated_biased = (
    fit_direct_effect_model_admissions(SIMULATED_BIASED_ADMISSIONS)
)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

plot_admissions_model_posterior(direct_effect_inference_simulated_biased)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[1, 1]	-3.021	0.256	-3.505	-2.563	0.003	0.002	7099.0	2936.0	1.0
alpha[1, 2]	-1.997	0.278	-2.516	-1.488	0.003	0.003	6422.0	3414.0	1.0
alpha[2, 1]	-1.371	0.188	-1.733	-1.034	0.002	0.002	7865.0	3270.0	1.0
alpha[2, 2]	-0.938	0.114	-1.145	-0.722	0.001	0.001	7740.0	3283.0	1.0
p_accept[1, 1]	0.048	0.011	0.029	0.072	0.000	0.000	7099.0	2936.0	1.0
p_accept[1, 2]	0.123	0.029	0.067	0.176	0.000	0.000	6422.0	3414.0	1.0
p_accept[2, 1]	0.204	0.030	0.148	0.259	0.000	0.000	7865.0	3270.0	1.0
p_accept[2, 2]	0.282	0.023	0.239	0.324	0.000	0.000	7740.0	3283.0	1.0

../_images/b21b5a7e2f065c4faa42c90a6e7c859df726682fa82d2260da98370c366b4c4c.png

../_images/02be9cbdb6d5e5144e4aee07063258d7e5ef015d6c27e941288c6c27f9652c1b.png

For comparison, here’s the ground truth biased admission rates, which we’re able to mostly recover:

# Department x Gender
BIASED_ACCEPTANCE_RATES

array([[0.05, 0.1 ],
       [0.2 , 0.3 ]])

Fit Direct Effect model to UNBIASED simulated admissions data#

direct_effect_model_simulated_unbiased, direct_effect_inference_simulated_unbiased = (
    fit_direct_effect_model_admissions(SIMULATED_UNBIASED_ADMISSIONS)
)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

plot_admissions_model_posterior(direct_effect_inference_simulated_unbiased)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[1, 1]	-2.155	0.175	-2.460	-1.805	0.002	0.002	5264.0	3200.0	1.0
alpha[1, 2]	-2.276	0.313	-2.901	-1.722	0.004	0.003	5684.0	2986.0	1.0
alpha[2, 1]	-0.618	0.161	-0.919	-0.322	0.002	0.001	7013.0	3158.0	1.0
alpha[2, 2]	-0.898	0.114	-1.114	-0.684	0.001	0.001	6859.0	3242.0	1.0
p_accept[1, 1]	0.105	0.016	0.076	0.136	0.000	0.000	5264.0	3200.0	1.0
p_accept[1, 2]	0.096	0.027	0.050	0.148	0.000	0.000	5684.0	2986.0	1.0
p_accept[2, 1]	0.351	0.036	0.284	0.419	0.000	0.000	7013.0	3158.0	1.0
p_accept[2, 2]	0.290	0.023	0.243	0.331	0.000	0.000	6859.0	3242.0	1.0

../_images/08221fe4839e60ee04c4fbdea91b44b4d10a1325e94b0b762d30595727cdddce.png

../_images/0c2eee666100ee3e46206bc624b42d8e8765d61b955f8660e736c325af76e62d.png

For comparison, here’s the ground truth unbiased admission rates, which were able to recover:

# Department x Gender
UNBIASED_ACCEPTANCE_RATES

array([[0.1, 0.1],
       [0.3, 0.3]])

4. Analyze the UC Berkeley Admissions data#

Review of the counts dataset#

we’ll use these counts data to model log odds of acceptance rate for gender/department
note that we’ll be using Binomial Regression, which is equivalent to Bernoulli regression, but operates on aggregate counts, as oppose to individual binary trials
- The examples above were actually implemented as Binomial Regression so that we can re-use code and demonstrate general patterns of analysis
- Either way, you’ll get the same inference using both approaches

ADMISSIONS

	dept	applicant.gender	admit	reject	applications
1	A	male	512	313	825
2	A	female	89	19	108
3	B	male	353	207	560
4	B	female	17	8	25
5	C	male	120	205	325
6	C	female	202	391	593
7	D	male	138	279	417
8	D	female	131	244	375
9	E	male	53	138	191
10	E	female	94	299	393
11	F	male	22	351	373
12	F	female	24	317	341

Fit Total Effect model to UC Berkeley admissions data#

Don’t forget to look at diagnostics, which we’ll skip here

total_effect_model, total_effect_inference = fit_total_effect_model_admissions(ADMISSIONS)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

plot_admissions_model_posterior(total_effect_inference, figsize=(6, 1.5))

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[male]	-0.221	0.038	-0.294	-0.151	0.001	0.000	4202.0	2936.0	1.0
alpha[female]	-0.828	0.050	-0.917	-0.732	0.001	0.001	3905.0	2831.0	1.0
p_accept[male]	0.445	0.009	0.427	0.462	0.000	0.000	4202.0	2936.0	1.0
p_accept[female]	0.304	0.011	0.286	0.325	0.000	0.000	3905.0	2831.0	1.0

../_images/c27c438f4095bd50af410c6e9c3c954924e9b43ed36cbdd568fa2338d6bc1ce1.png

../_images/26007db02f8220015cd49c95c590e9aa040216e29d6ae0878528c1dd5d59a0f8.png

Total Causal Effect#

# Total Causal Effect
female_p_accept = total_effect_inference.posterior["p_accept"].sel(gender="female")
male_p_accept = total_effect_inference.posterior["p_accept"].sel(gender="male")
contrast = male_p_accept - female_p_accept

plt.subplots(figsize=(8, 3))
az.plot_dist(contrast)
plt.axvline(0, linestyle="--", color="black", label="No difference")
plt.xlabel("gender contrast (acceptance probability)")
plt.ylabel("density")
plt.title(
    "Total Causal Effect (Experienced Discrimination):\npositive values indicate male advantage"
)
plt.legend();

../_images/99b458765f16558672fb5499b0da78adf9a777177c864fb7375054534b147f88.png

Fit Direct Effect model to UC Berkeley admissions data#

direct_effect_model, direct_effect_inference = fit_direct_effect_model_admissions(ADMISSIONS)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

plot_admissions_model_posterior(direct_effect_inference, figsize=(6, 3))

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[A, male]	0.490	0.072	0.353	0.620	0.001	0.001	5850.0	3067.0	1.0
alpha[A, female]	1.474	0.243	1.029	1.937	0.003	0.002	5579.0	2964.0	1.0
alpha[B, male]	0.531	0.087	0.377	0.703	0.001	0.001	5106.0	2554.0	1.0
alpha[B, female]	0.657	0.395	-0.075	1.387	0.005	0.004	6701.0	3242.0	1.0
alpha[C, male]	-0.532	0.115	-0.743	-0.310	0.001	0.001	5940.0	3091.0	1.0
alpha[C, female]	-0.660	0.086	-0.831	-0.506	0.001	0.001	5786.0	3051.0	1.0
alpha[D, male]	-0.698	0.102	-0.892	-0.513	0.001	0.001	6055.0	3004.0	1.0
alpha[D, female]	-0.618	0.108	-0.828	-0.425	0.001	0.001	6215.0	2943.0	1.0
alpha[E, male]	-0.935	0.158	-1.239	-0.645	0.002	0.002	5672.0	2865.0	1.0
alpha[E, female]	-1.146	0.119	-1.364	-0.928	0.002	0.001	5891.0	2814.0	1.0
alpha[F, male]	-2.667	0.205	-3.057	-2.291	0.003	0.002	5224.0	2612.0	1.0
alpha[F, female]	-2.499	0.195	-2.863	-2.131	0.003	0.002	6109.0	3036.0	1.0
p_accept[A, male]	0.620	0.017	0.587	0.650	0.000	0.000	5850.0	3067.0	1.0
p_accept[A, female]	0.811	0.037	0.742	0.878	0.000	0.000	5579.0	2964.0	1.0
p_accept[B, male]	0.629	0.020	0.595	0.670	0.000	0.000	5106.0	2554.0	1.0
p_accept[B, female]	0.653	0.086	0.488	0.805	0.001	0.001	6701.0	3242.0	1.0
p_accept[C, male]	0.370	0.027	0.322	0.423	0.000	0.000	5940.0	3091.0	1.0
p_accept[C, female]	0.341	0.019	0.303	0.376	0.000	0.000	5786.0	3051.0	1.0
p_accept[D, male]	0.333	0.023	0.291	0.374	0.000	0.000	6055.0	3004.0	1.0
p_accept[D, female]	0.351	0.025	0.304	0.395	0.000	0.000	6215.0	2943.0	1.0
p_accept[E, male]	0.283	0.032	0.222	0.341	0.000	0.000	5672.0	2865.0	1.0
p_accept[E, female]	0.242	0.022	0.204	0.283	0.000	0.000	5891.0	2814.0	1.0
p_accept[F, male]	0.066	0.012	0.043	0.089	0.000	0.000	5224.0	2612.0	1.0
p_accept[F, female]	0.077	0.014	0.052	0.103	0.000	0.000	6109.0	3036.0	1.0

../_images/5c5f9ff73e139d0eac6e0b45d4a70630054460c6b58ee7283f2105658a31feb6.png

../_images/3010d4b8d85dd77d24a3a3ee76719dffc9f8178a7c2ff8c7dd82daf042b3d746.png

# Fancier plot of department/gender acceptance probability distributions
_, ax = plt.subplots(figsize=(10, 5))
for ii, dept in enumerate(ADMISSIONS.dept.unique()):
    color = f"C{ii}"
    for gend in ADMISSIONS["applicant.gender"].unique():
        label = f"{dept}:{gend}"
        linestyle = "-." if gend == "female" else "-"
        post = direct_effect_inference.posterior["p_accept"].sel(department=dept, gender=gend)
        az.plot_dist(post, label=label, color=color, plot_kwargs={"linestyle": linestyle})
plt.xlabel("acceptance probability")
plt.legend(title="Deptartment:Gender");

../_images/415ead0f5eb46bcfc4e325e994a2b3c7037697f738f32d9ea967b25cf401601a.png

Direct Causal Effect#

# Direct Causal Effect
_, ax = plt.subplots(figsize=(8, 4))

for ii, dept in enumerate(ADMISSIONS.dept.unique()):
    color = f"C{ii}"
    label = f"{dept}"

    female_p_accept = direct_effect_inference.posterior["p_accept"].sel(
        gender="female", department=dept
    )
    male_p_accept = direct_effect_inference.posterior["p_accept"].sel(
        gender="male", department=dept
    )

    contrast = male_p_accept - female_p_accept
    az.plot_dist(contrast, color=color, label=label)

plt.xlabel("gender contrast (acceptance probability)")
plt.ylabel("density")
plt.title(
    "Direct Causal Effect (Institutional Discrimination):\npositive values indicate male advantage"
)
plt.axvline(0, linestyle="--", color="black", label="No difference")
plt.legend(title="Department");

../_images/6eabdde5bbcbad977e684de962381eabfad76709e45c1684722af59be8e99a79.png

Average Direct Effect#

The causal effect of gender averaged across departments (marginalize)
Depends on the distribution of applications to each deparment
This is easy to do as a simulation

Post-stratification#

Re-weight estimates for the target population
allows us to apply model fit from one university to estiamte causal impact at another university with a different distribution of departments
Here, we use the empirical distribution for re-weighting estimates

# Use the empirical distribution of departments -- we'd update this for a different university
applications_per_dept = ADMISSIONS.groupby("dept").sum()["applications"].values
total_applications = applications_per_dept.sum()
department_weight = applications_per_dept / total_applications

female_alpha = direct_effect_inference.posterior.sel(gender="female")["alpha"]
male_alpha = direct_effect_inference.posterior.sel(gender="male")["alpha"]

weighted_female_alpha = female_alpha * department_weight
weighed_male_alpha = male_alpha * department_weight

contrast = weighed_male_alpha - weighted_female_alpha

_, ax = plt.subplots(figsize=(8, 4))
az.plot_dist(contrast)

plt.axvline(0, linestyle="--", color="k", label="No Difference")
plt.xlabel("causal effect of perceived gender")
plt.ylabel("density")
plt.title("Average Direct Causal of Gender (perception),\npositive values indicate male advantage")
plt.xlim([-0.3, 0.3])
plt.legend();

../_images/bf5a715fc85760603d78e9013f8d45cfa7f6feaad8f4fd889e829847579c1444.png

To verify the averaging process, we can look at the contrast of the p_accept samples from the posterior, which provides similar results. However, looking ath the posterior obviously wouldn’t work for making predictions for an out-of-sample university however.

direct_posterior = direct_effect_inference.posterior

# Select by gender; note: p_accept already has link function applied
female_posterior = direct_posterior.sel(gender="female")[
    "p_accept"
]  # shape: (chain x draw x department)
male_posterior = direct_posterior.sel(gender="male")[
    "p_accept"
]  # shape: (chain x draw x department)
contrast = male_posterior - female_posterior  # shape: (chain x draw x department)

_, ax = plt.subplots(figsize=(8, 4))
# marginalize / collapse the contrast across all departments
az.plot_dist(contrast)
plt.axvline(0, linestyle="--", color="k", label="No Difference")
plt.xlim([-0.3, 0.3])
plt.xlabel("causal effect of perceived gender")
plt.ylabel("density")
plt.title("Posterior contrast of $p_{accept}$,\npositive values indicate male advantage")
plt.legend();

../_images/0761bc15a4341cc20369f2c055b92d42af311819f68a4990413aeba9e94ccca1.png

Discrimination?#

Hard to say

Big structural effects
Distribution of applicants could be due to discrimination
Confounds are likely

BONUS: Survival Analysis#

Counts often modeled as time-to-event (i.e. Exponential or Gamma distribution) – Goal is to estimate event rate
Tricky b.c. ignoring censored data can lead to inferential errors
- Left censored: we don’t know when the timer started
- Right censored the observation period finished before the event had time to happen

Example: Austin Cat Adoption#

20k cats
Events: adopted (1) or not (0)
Censoring mechanisms
- death before adoption
- escape
- not adopted yet

Goal: determine if Black are adopted at a lower rate than non-Black cats.

CATS = utils.load_data("AustinCats")
CATS.head()

	id	days_to_event	date_out	out_event	date_in	in_event	breed	color	intake_age
0	A730601	1	07/08/2016 09:00:00 AM	Transfer	07/07/2016 12:11:00 PM	Stray	Domestic Shorthair Mix	Blue Tabby	7
1	A679549	25	06/16/2014 01:54:00 PM	Transfer	05/22/2014 03:43:00 PM	Stray	Domestic Shorthair Mix	Black/White	1
2	A683656	4	07/17/2014 04:57:00 PM	Adoption	07/13/2014 01:20:00 PM	Stray	Snowshoe Mix	Lynx Point	2
3	A709749	41	09/22/2015 12:49:00 PM	Transfer	08/12/2015 06:29:00 PM	Stray	Domestic Shorthair Mix	Calico	12
4	A733551	9	09/01/2016 12:00:00 AM	Transfer	08/23/2016 02:35:00 PM	Stray	Domestic Shorthair Mix	Brown Tabby/White	1

Modeling outcome variable: `days_to_event`#

Two go-to distributions for modeling time-to-event

Exponential Distribution:

simpler of the two (single parameter)
assumes constant rate
maximum entropy distribution amongst the set of non-negative continuous distributions that have the same average rate.
assumes a single thing to occur (e.g. part failure) before an event occurs (machine breakdown)

Gamma Distribution

more complex of the two (two parameters)
maximum entropy distribution amongst the set of distributions with the same mean and average log.
assumes multiple things to happen (e.g. part failures) before an event occurs (machine breakdown)

_, axs = plt.subplots(1, 2, figsize=(10, 4))

days = np.linspace(0, 100, 100)

plt.sca(axs[0])
for scale in [20, 50]:
    expon = stats.expon(loc=0.01, scale=scale)
    utils.plot_line(days, expon.pdf(days), label=f"$\\lambda$=1/{scale}")
plt.xlabel("days")
plt.ylabel("density")
plt.legend()
plt.title("Exponential Distribution")


plt.sca(axs[1])
N = 8
alpha = 10
for scale in [2, 5]:
    gamma = stats.gamma(a=10, loc=alpha, scale=scale)
    utils.plot_line(days, gamma.pdf(days), label=f"$\\alpha$={alpha}, $\\beta$={scale}")
plt.xlabel("days")
plt.ylabel("density")
plt.legend()
plt.title("Gamma Distribution");

../_images/16b63aef4e140181238b6520c95502c2655a139a9dd425bdde257647b2e35651.png

Censored and un-censored observations#

Observed data use the Cumulative distribution; i.e. the probability that the event occurred by time x
Unobserved (censored) data instead require the Complementary of the CDF; i.e. the probability that the event hasn’t happened yet.

xs = np.linspace(0, 5, 100)
_, axs = plt.subplots(1, 2, figsize=(10, 5))
plt.sca(axs[0])
cdfs = {}
for lambda_ in [0.5, 1]:
    cdfs[lambda_] = ys = 1 - np.exp(-lambda_ * xs)
    utils.plot_line(xs, ys, label=f"$\\lambda={lambda_}$")

plt.xlabel("x")
plt.ylabel("$1 - \\exp(-x)$")
plt.legend()
plt.title("CDF\np(event happened before or at time x| $\\lambda$)", fontsize=12)

plt.sca(axs[1])
for lambda_ in [0.5, 1]:
    ys = 1 - cdfs[lambda_]
    utils.plot_line(xs, ys, label=f"$\\lambda={lambda_}$")

plt.xlabel("x")
plt.ylabel("$\\exp(-x)$")
plt.legend()
plt.title("CCDF\np(event hasn't happened before or at time x| $\\lambda$)", fontsize=12);

../_images/884b739928a0c012c75e4266f451f73ed32799562971b0b7c9c837d9cee7c562.png

Statistical Model#

\[\begin{split} \begin{align*} D_i | A_i &= 1 \sim \text{Exponential}(\lambda_i) \\ D_i | A_i &= 0 \sim \text{ExponentialCCDF}(\lambda_i) \\ \lambda_i &= \frac{1}{\mu_i} \\ \log \mu_i &= \alpha_{\text{cat color}[i]} \\ \alpha_{Black, Other} &\sim \text{Exponential}(\gamma) \end{align*} \end{split}\]

\(D_i | A_i = 1\) - observed adoptions
\(D_i | A_i = 1\) - not-yet-observed adoptions
\(\alpha_{\text{cat color}[i]}\) log average time-to-adoption for each cat color
\(\log \mu_i\) – link function ensures \(\alpha\)s are positive
\(\lambda_i = \frac{1}{\mu_i}\) simplifies estimating average time-to-adoption

Finding reasonable hyperparameter for \(\alpha\)#

We’ll need to determine a reasonable data for the Exponential prior mean parameter \(\gamma\). To do so, we’ll look at the empirical distribution of time to adoption:

# determine reasonable prior for alpha
plt.subplots(figsize=(6, 3))
CATS[CATS.out_event == "Adoption"].days_to_event.hist(bins=100)
plt.xlim([0, 500]);

../_images/f4d7f8c47f46052747ae4143792e9d8ffd89fced4f2a1ffd8e4a55d3f0942e89.png

Using the above empirical historgram, we see that a majority of the probablity mass is between zero and 200, so let’s use 50 as the expected wait time.

CAT_COLOR_ID, CAT_COLOR = pd.factorize(CATS.color.apply(lambda x: "Other" if x != "Black" else x))
ADOPTED_ID, ADOPTED = pd.factorize(
    CATS.out_event.apply(lambda x: "Other" if x != "Adoption" else "Adopted")
)
DAYS_TO_ADOPTION = CATS.days_to_event.values.astype(float)
LAMBDA = 50

with pm.Model(coords={"cat_color": CAT_COLOR}) as adoption_model:

    # Censoring
    right_censoring = DAYS_TO_ADOPTION.copy()
    right_censoring[ADOPTED_ID == 0] = DAYS_TO_ADOPTION.max()

    # Priors
    gamma = 1 / LAMBDA
    alpha = pm.Exponential("alpha", gamma, dims="cat_color")

    # Likelihood
    log_adoption_rate = 1 / alpha[CAT_COLOR_ID]
    pm.Censored(
        "adopted",
        pm.Exponential.dist(lam=log_adoption_rate),
        lower=None,
        upper=right_censoring,
        observed=DAYS_TO_ADOPTION,
    )
    adoption_inference = pm.sample()

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.

Poor black kitties 🐈‍⬛#

It appears that black cats DO take longer to get adopted.

Posterior Summary#

_, ax = plt.subplots(figsize=(4, 2))
az.plot_forest(adoption_inference, var_names=["alpha"], combined=True, hdi_prob=0.89, ax=ax)
plt.xlabel("days to adoption");

../_images/7463a9a3cc421cbb443b994720e1bed8cee6605deb849895b6701f3fae3130d8.png

Posterior distributions#

for ii, cat_color in enumerate(["Black", "Other"]):
    color = "black" if cat_color == "Black" else "C0"
    posterior_alpha = adoption_inference.posterior.sel(cat_color=cat_color)["alpha"]
    az.plot_dist(posterior_alpha, color=color, label=cat_color)
plt.xlabel("waiting time")
plt.ylabel("density")
plt.legend();

../_images/c782345a63e71e4be9174b60c54c5ac3b0c49bb59391b20356de0530fa35b5e1.png

days_until_adoption_ = np.linspace(1, 100)
n_posterior_samples = 25

for cat_color in ["Black", "Other"]:
    color = "black" if cat_color == "Black" else "C0"
    posterior = adoption_inference.posterior.sel(cat_color=cat_color)
    alpha_samples = posterior.alpha.sel(chain=0)[:n_posterior_samples].values
    for ii, alpha in enumerate(alpha_samples):
        label = cat_color if ii == 0 else None
        plt.plot(
            days_until_adoption_,
            np.exp(-days_until_adoption_ / alpha),
            color=color,
            alpha=0.25,
            label=label,
        )

plt.xlabel("days until adoption")
plt.ylabel("fraction")
plt.legend();

../_images/df73d1638b5890cef1285abab12e483f79231d7598ea9d3ee59ef39348747888.png

Authors#

Ported to PyMC by Dustin Stansbury (2024)
Based on Statistical Rethinking (2023) lectures by Richard McElreath

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray

Last updated: Tue Dec 17 2024

Python implementation: CPython
Python version       : 3.12.5
IPython version      : 8.27.0

pytensor: 2.26.4
aeppl   : not installed
xarray  : 2024.7.0

statsmodels: 0.14.2
numpy      : 1.26.4
pandas     : 2.2.2
matplotlib : 3.9.2
scipy      : 1.14.1
xarray     : 2024.7.0
pymc       : 5.19.1
arviz      : 0.19.0

Watermark: 2.5.0

License notice#

All the notebooks in this example gallery are provided under the MIT License which allows modification, and redistribution for any use provided the copyright and license notices are preserved.

Citing PyMC examples#

To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.

Important

Many notebooks are adapted from other sources: blogs, books… In such cases you should cite the original source as well.

Also remember to cite the relevant libraries used by your code.

Here is an citation template in bibtex:

@incollection{citekey,
  author    = "<notebook authors, see above>",
  title     = "<notebook title>",
  editor    = "PyMC Team",
  booktitle = "PyMC examples",
  doi       = "10.5281/zenodo.5654871"
}

which once rendered could look like:

Categories

Tags

Modeling Events#

UC Berkeley Admissions#

Dataset#

Modeling Events#

Admissions: Drawing the owl 🦉#

1. Estimand(s)#

Which path defines “discrimination”#

Direct Discrimination (Causal Effect)#

Indirect Discrimination#

Total Discrimination#

Estimands & Estimators#

Unobserved Confounds#

2. Scientific Model(s):#

Simulate the process#

Simulated data properties#

Both scenarios#

Unbiased scenario:#

Biased scenario#

3. Statistical Models#

Modeling Events#

Linear Models#

Event Models#

Generalized Linear Models & Link Functions#

Logit Link Function and Logisitic Inverse function#

Binary outcomes#

logit is a harsh transform#

Bayesian Updating for Logistic Regression#

Priors for logistic regression#

Prior Predictive Simulations#

Statistical models for admissions#

Total Causal Effect#

Direct Causal Effect#

Fitting Total Causal Effect Model#

Fit Total Effect model on BIASED simulated admissions#

Fit Total Effect model on UNBIASED simulated admissions#

Fitting Direct Causal Effect Model#

Fit Direct Effect model to BIASED simulated admissions data#

Fit Direct Effect model to UNBIASED simulated admissions data#

4. Analyze the UC Berkeley Admissions data#

Review of the counts dataset#

Fit Total Effect model to UC Berkeley admissions data#

Total Causal Effect#

Fit Direct Effect model to UC Berkeley admissions data#

Direct Causal Effect#

Average Direct Effect#

Post-stratification#

Discrimination?#

BONUS: Survival Analysis#

Example: Austin Cat Adoption#

Modeling outcome variable: days_to_event#

Censored and un-censored observations#

Statistical Model#

Finding reasonable hyperparameter for \(\alpha\)#

Poor black kitties 🐈‍⬛#

Posterior Summary#

Posterior distributions#

Authors#

License notice#

Citing PyMC examples#

Modeling outcome variable: `days_to_event`#