Reparameterizing the Weibull Accelerated Failure Time Model#

import arviz as az
import numpy as np
import pymc as pm
import pytensor.tensor as pt

print(f"Running on PyMC v{pm.__version__}")

Running on PyMC v5.28.0+58.gf58491a3

Attention

This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.

# These dependencies need to be installed separately from PyMC
import statsmodels.api as sm

%config InlineBackend.figure_format = 'retina'
# These seeds are for sampling data observations
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
# Set a seed for reproducibility of posterior results
seed: int = sum(map(ord, "aft_weibull"))
rng: np.random.Generator = np.random.default_rng(seed=seed)
az.style.use("arviz-variat")

Dataset#

The previous example notebook on Bayesian parametric survival analysis introduced two different accelerated failure time (AFT) models: Weibull and log-linear. In this notebook, we present three different parameterizations of the Weibull AFT model.

The data set we’ll use is the flchain R data set, which comes from a medical study investigating the effect of serum free light chain (FLC) on lifespan. Read the full documentation of the data by running:

print(sm.datasets.get_rdataset(package='survival', dataname='flchain').__doc__).

# Fetch and clean data
data = (
    sm.datasets.get_rdataset(package="survival", dataname="flchain")
    .data.sample(500)  # Limit ourselves to 500 observations
    .reset_index(drop=True)
)

y = data.futime.values
censored = ~data["death"].values.astype(bool)

y[:5]

array([ 975, 2272,  138, 4262, 4928])

censored[:5]

array([False,  True, False,  True,  True])

Using `pm.Potential`#

We have an unique problem when modelling censored data. Strictly speaking, we don’t have any data for censored values: we only know the number of values that were censored. How can we include this information in our model?

One way do this is by making use of pm.Potential. The PyMC2 docs explain its usage very well. Essentially, declaring pm.Potential('x', logp) will add logp to the log-likelihood of the model.

However, pm.Potential only effect probability based sampling this excludes using pm.sample_prior_predictice and pm.sample_posterior_predictive. We can overcome these limitations by using pm.Censored instead. We can model our right-censored data by defining the upper argument of pm.Censored.

Parameterization 1#

This parameterization is an intuitive, straightforward parameterization of the Weibull survival function. This is probably the first parameterization to come to one’s mind.

# normalize the event time between 0 and 1
y_norm = y / np.max(y)

# If censored then observed event time else maximum time
right_censored = [x if x > 0 else np.max(y_norm) for x in y_norm * censored]

with pm.Model() as model_1:
    alpha_sd = 1.0

    mu = pm.Normal("mu", mu=0, sigma=1)
    alpha_raw = pm.Normal("a0", mu=0, sigma=0.1)
    alpha = pm.Deterministic("alpha", pt.exp(alpha_sd * alpha_raw))
    beta = pm.Deterministic("beta", pt.exp(mu / alpha))
    beta_backtransformed = pm.Deterministic("beta_backtransformed", beta * np.max(y))

    latent = pm.Weibull.dist(alpha=alpha, beta=beta)
    y_obs = pm.Censored("Censored_likelihood", latent, upper=right_censored, observed=y_norm)

with model_1:
    idata_param1 = pm.sample(nuts_sampler="numpyro", random_seed=rng)

/home/osvaldo/anaconda3/envs/arviz_1/lib/python3.14/site-packages/pymc/sampling/mcmc.py:832: FutureWarning: The arguments to `from_dict` have changed with the release of arviz 1.0. Please refer to the arviz documentation for more details
  return _sample_external_nuts(

az.plot_trace_dist(idata_param1, var_names=["alpha", "beta"]);

../_images/e0e96fe671f69f79c4044c401b463fe72a5b2dc5f031f8322a5b667720ef43a1.png

az.summary(idata_param1, var_names=["alpha", "beta", "beta_backtransformed"], round_to=2)

	mean	sd	eti89_lb	eti89_ub	ess_bulk	ess_tail	r_hat	mcse_mean	mcse_sd
alpha	0.97	0.06	0.87	1.07	3351.30	2885.05	1.0	0.00	0.00
beta	2.86	0.35	2.36	3.48	2504.32	2343.84	1.0	0.01	0.01
beta_backtransformed	14652.30	1804.09	12126.56	17870.69	2504.32	2343.84	1.0	35.98	28.83

Parameterization 2#

Note that, confusingly, alpha is now called r, and alpha denotes a prior; we maintain this notation to stay faithful to the original implementation in Stan. In this parameterization, we still model the same parameters alpha (now r) and beta.

For more information, see this Stan example model and the corresponding documentation.

with pm.Model() as model_2:
    alpha = pm.Normal("alpha", mu=0, sigma=1)
    r = pm.Gamma("r", alpha=2, beta=1)
    beta = pm.Deterministic("beta", pt.exp(-alpha / r))
    beta_backtransformed = pm.Deterministic("beta_backtransformed", beta * np.max(y))

    latent = pm.Weibull.dist(alpha=r, beta=beta)
    y_obs = pm.Censored("Censored_likelihood", latent, upper=right_censored, observed=y_norm)

with model_2:
    idata_param2 = pm.sample(nuts_sampler="numpyro", random_seed=rng)

/home/osvaldo/anaconda3/envs/arviz_1/lib/python3.14/site-packages/pymc/sampling/mcmc.py:832: FutureWarning: The arguments to `from_dict` have changed with the release of arviz 1.0. Please refer to the arviz documentation for more details
  return _sample_external_nuts(

az.plot_trace_dist(idata_param2, var_names=["r", "beta"]);

../_images/8fccb7c83721637b3541844d734c01c8adacdb8d93a0a4b154f0ed66ced8ad29.png

az.summary(idata_param2, var_names=["r", "beta", "beta_backtransformed"], round_to=2)

	mean	sd	eti89_lb	eti89_ub	ess_bulk	ess_tail	r_hat	mcse_mean	mcse_sd
r	0.95	0.08	0.83	1.08	2634.9	2481.78	1.0	0.00	0.00
beta	2.97	0.45	2.34	3.74	2406.4	2152.16	1.0	0.01	0.01
beta_backtransformed	15217.57	2309.33	12008.07	19184.76	2406.4	2152.16	1.0	47.91	47.51

Parameterization 3#

In this parameterization, we model the log-linear error distribution with a Gumbel distribution instead of modelling the survival function directly. For more information, see this blog post.

logtime = np.log(y)

# If censored then observed event time else maximum time
right_censored = [x if x > 0 else np.max(logtime) for x in logtime * censored]

with pm.Model() as model_3:
    s = pm.HalfNormal("s", tau=3.0)
    gamma = pm.Normal("gamma", mu=0, sigma=5)

    latent = pm.Gumbel.dist(mu=gamma, beta=s)
    y_obs = pm.Censored("Censored_likelihood", latent, upper=right_censored, observed=logtime)

with model_3:
    idata_param3 = pm.sample(tune=4000, draws=2000, nuts_sampler="numpyro", random_seed=rng)

/home/osvaldo/anaconda3/envs/arviz_1/lib/python3.14/site-packages/pymc/sampling/mcmc.py:832: FutureWarning: The arguments to `from_dict` have changed with the release of arviz 1.0. Please refer to the arviz documentation for more details
  return _sample_external_nuts(

az.plot_trace_dist(idata_param3);

../_images/c1a7723b1ed4e9edb94f7c0261d9cffa0f7a5209fb5ee4018c7831f54e1418e9.png

az.summary(idata_param3, round_to=2)

	mean	sd	eti89_lb	eti89_ub	ess_bulk	ess_tail	r_hat	mcse_mean	mcse_sd
gamma	9.48	0.21	9.14	9.83	3328.87	4221.94	1.0	0.0	0.0
s	3.54	0.16	3.29	3.81	3256.12	4251.45	1.0	0.0	0.0

Authors#

Originally collated by Junpeng Lao on Apr 21, 2018. See original code here.
Authored and ported to Jupyter notebook by George Ho on Jul 15, 2018.
Updated for compatibility with PyMC v5 by Chris Fonnesbeck on Jan 16, 2023.
Updated to replace pm.Potential with pm.Censored by Jonathan Dekermanjian on Nov 25, 2024.
Updated by Osvaldo Martin in April 2026.

%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Sat, 25 Apr 2026

Python implementation: CPython
Python version       : 3.14.4
IPython version      : 9.12.0

arviz      : 1.1.0
numpy      : 2.4.4
pymc       : 5.28.0+58.gf58491a3
pytensor   : 2.38.0+133.g80cc113b5
statsmodels: 0.14.6

Watermark: 2.6.0

License notice#

All the notebooks in this example gallery are provided under the MIT License which allows modification, and redistribution for any use provided the copyright and license notices are preserved.

Citing PyMC examples#

To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.

Important

Many notebooks are adapted from other sources: blogs, books… In such cases you should cite the original source as well.

Also remember to cite the relevant libraries used by your code.

Here is an citation template in bibtex:

@incollection{citekey,
  author    = "<notebook authors, see above>",
  title     = "<notebook title>",
  editor    = "PyMC Team",
  booktitle = "PyMC examples",
  doi       = "10.5281/zenodo.5654871"
}

which once rendered could look like:

Categories

Tags

Reparameterizing the Weibull Accelerated Failure Time Model#

Dataset#

Using `pm.Potential`#

Parameterization 1#

Parameterization 2#

Parameterization 3#

Authors#

License notice#

Citing PyMC examples#

Categories

Tags

Reparameterizing the Weibull Accelerated Failure Time Model#

Dataset#

Using pm.Potential#

Parameterization 1#

Parameterization 2#

Parameterization 3#

Authors#

License notice#

Citing PyMC examples#

Using `pm.Potential`#