Posted in 2023

Out-Of-Sample Predictions

We want to fit a logistic regression model where there is a multiplicative interaction between two numerical features.

Read more ...


Bayesian copula estimation: Describing correlated joint distributions

When we deal with multiple variables (e.g. \(a\) and \(b\)) we often want to describe the joint distribution \(P(a, b)\) parametrically. If we are lucky, then this joint distribution might be ‘simple’ in some way. For example, it could be that \(a\) and \(b\) are statistically independent, in which case we can break down the joint distribution into \(P(a, b) = P(a) P(b)\) and so we just need to find appropriate parametric descriptions for \(P(a)\) and \(P(b)\). Even if this is not appropriate, it may be that \(P(a, b)\) could be described well by a simple multivariate distribution, such as a multivariate normal distribution for example.

Read more ...


Frailty and Survival Regression Models

This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.

Read more ...


GLM: Negative Binomial Regression

This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.

Read more ...


The Besag-York-Mollie Model for Spatial Data

This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.

Read more ...


Faster Sampling with JAX and Numba

PyMC can compile its models to various execution backends through PyTensor, including:

Read more ...


Interventional distributions and graph mutation with the do-operator

PyMC is a pivotal component of the open source Bayesian statistics ecosystem. It helps solve real problems across a wide range of industries and academic research areas every day. And it has gained this level of utility by being accessible, powerful, and practically useful at solving Bayesian statistical inference problems.

Read more ...


Gaussian Processes: Latent Variable Implementation

The gp.Latent class is a direct implementation of a Gaussian process without approximation. Given a mean and covariance function, we can place a prior on the function \(f(x)\),

Read more ...


Marginal Likelihood Implementation

The gp.Marginal class implements the more common case of GP regression: the observed data are the sum of a GP and Gaussian noise. gp.Marginal has a marginal_likelihood method, a conditional method, and a predict method. Given a mean and covariance function, the function \(f(x)\) is modeled as,

Read more ...


Discrete Choice and Random Utility Models

This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.

Read more ...


Regression Models with Ordered Categorical Outcomes

Like many areas of statistics the language of survey data comes with an overloaded vocabulary. When discussing survey design you will often hear about the contrast between design based and model based approaches to (i) sampling strategies and (ii) statistical inference on the associated data. We won’t wade into the details about different sample strategies such as: simple random sampling, cluster random sampling or stratified random sampling using population weighting schemes. The literature on each of these is vast, but in this notebook we’ll talk about when any why it’s useful to apply model driven statistical inference to Likert scaled survey response data and other kinds of ordered categorical data.

Read more ...


Longitudinal Models of Change

The study of change involves simultaneously analysing the individual trajectories of change and abstracting over the set of individuals studied to extract broader insight about the nature of the change in question. As such it’s easy to lose sight of the forest for the focus on the trees. In this example we’ll demonstrate some of the subtleties of using hierarchical bayesian models to study the change within a population of individuals - moving from the within individual view to the between/cross individuals perspective.

Read more ...


Using ModelBuilder class for deploying PyMC models

Many users face difficulty in deploying their PyMC models to production because deploying/saving/loading a user-created model is not well standardized. One of the reasons behind this is there is no direct way to save or load a model in PyMC like scikit-learn or TensorFlow. The new ModelBuilder class is aimed to improve this workflow by providing a scikit-learn inspired API to wrap your PyMC models.

Read more ...


Pathfinder Variational Inference

Pathfinder [Zhang et al., 2021] is a variational inference algorithm that produces samples from the posterior of a Bayesian model. It compares favorably to the widely used ADVI algorithm. On large problems, it should scale better than most MCMC algorithms, including dynamic HMC (i.e. NUTS), at the cost of a more biased estimate of the posterior. For details on the algorithm, see the arxiv preprint.

Read more ...


Bayesian Missing Data Imputation

Duplicate implicit target name: “bayesian missing data imputation”.

Read more ...


Multivariate Gaussian Random Walk

This notebook shows how to fit a correlated time series using multivariate Gaussian random walks (GRWs). In particular, we perform a Bayesian regression of the time series data against a model dependent on GRWs.

Read more ...


Rolling Regression

Pairs trading is a famous technique in algorithmic trading that plays two stocks against each other.

Read more ...


Hierarchical Partial Pooling

Suppose you are tasked with estimating baseball batting skills for several players. One such performance metric is batting average. Since players play a different number of games and bat in different positions in the order, each player has a different number of at-bats. However, you want to estimate the skill of all players, including those with a relatively small number of batting opportunities.

Read more ...


Quantile Regression with BART

Usually when doing regression we model the conditional mean of some distribution. Common cases are a Normal distribution for continuous unbounded responses, a Poisson distribution for count data, etc.

Read more ...


DEMetropolis(Z) Sampler Tuning

For continuous variables, the default PyMC sampler (NUTS) requires that gradients are computed, which PyMC does through autodifferentiation. However, in some cases, a PyMC model may not be supplied with gradients (for example, by evaluating a numerical model outside of PyMC) and an alternative sampler is necessary. The DEMetropolisZ sampler is an efficient choice for gradient-free inference. The implementation of DEMetropolisZ in PyMC is based on ter Braak and Vrugt [2008] but with a modified tuning scheme. This notebook compares various tuning parameter settings for the sampler, including the drop_tune_fraction parameter which was introduced in PyMC.

Read more ...


DEMetropolis and DEMetropolis(Z) Algorithm Comparisons

For continuous variables, the default PyMC sampler (NUTS) requires that gradients are computed, which PyMC does through autodifferentiation. However, in some cases, a PyMC model may not be supplied with gradients (for example, by evaluating a numerical model outside of PyMC) and an alternative sampler is necessary. Differential evolution (DE) Metropolis samplers are an efficient choice for gradient-free inference. This notebook compares the DEMetropolis and the DEMetropolisZ samplers in PyMC to help determine which is a better option for a given problem.

Read more ...


Reparameterizing the Weibull Accelerated Failure Time Model

The previous example notebook on Bayesian parametric survival analysis introduced two different accelerated failure time (AFT) models: Weibull and log-linear. In this notebook, we present three different parameterizations of the Weibull AFT model.

Read more ...


Bayesian Survival Analysis

Survival analysis studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC.

Read more ...


ODE Lotka-Volterra With Bayesian Inference in Multiple Ways

The purpose of this notebook is to demonstrate how to perform Bayesian inference on a system of ordinary differential equations (ODEs), both with and without gradients. The accuracy and efficiency of different samplers are compared.

Read more ...


Introduction to Variational Inference with PyMC

The most common strategy for computing posterior quantities of Bayesian models is via sampling, particularly Markov chain Monte Carlo (MCMC) algorithms. While sampling algorithms and associated computing have continually improved in performance and efficiency, MCMC methods still scale poorly with data size, and become prohibitive for more than a few thousand observations. A more scalable alternative to sampling is variational inference (VI), which re-frames the problem of computing the posterior distribution as an optimization problem.

Read more ...


Empirical Approximation overview

For most models we use sampling MCMC algorithms like Metropolis or NUTS. In PyMC we got used to store traces of MCMC samples and then do analysis using them. There is a similar concept for the variational inference submodule in PyMC: Empirical. This type of approximation stores particles for the SVGD sampler. There is no difference between independent SVGD particles and MCMC samples. Empirical acts as a bridge between MCMC sampling output and full-fledged VI utils like apply_replacements or sample_node. For the interface description, see variational_api_quickstart. Here we will just focus on Emprical and give an overview of specific things for the Empirical approximation.

Read more ...


Hierarchical Binomial Model: Rat Tumor Example

This short tutorial demonstrates how to use PyMC to do inference for the rat tumour example found in chapter 5 of Bayesian Data Analysis 3rd Edition [Gelman et al., 2013]. Readers should already be familiar with the PyMC API.

Read more ...


GLM: Robust Linear Regression

Duplicate implicit target name: “glm: robust linear regression”.

Read more ...


Bayes Factors and Marginal Likelihood

The “Bayesian way” to compare models is to compute the marginal likelihood of each model \(p(y \mid M_k)\), i.e. the probability of the observed data \(y\) given the \(M_k\) model. This quantity, the marginal likelihood, is just the normalizing constant of Bayes’ theorem. We can see this if we write Bayes’ theorem and make explicit the fact that all inferences are model-dependant.

Read more ...


Analysis of An AR(1) Model in PyMC

Consider the following AR(2) process, initialized in the infinite past:

\[ y_t = \rho_0 + \rho_1 y_{t-1} + \rho_2 y_{t-2} + \epsilon_t, \]
where \(\epsilon_t \overset{iid}{\sim} {\cal N}(0,1)\). Suppose you’d like to learn about \(\rho\) from a a sample of observations \(Y^T = \{ y_0, y_1,\ldots, y_T \}\).

Read more ...


Reliability Statistics and Predictive Calibration

Duplicate implicit target name: “reliability statistics and predictive calibration”.

Read more ...


Modeling Heteroscedasticity with BART

In this notebook we show how to use BART to model heteroscedasticity as described in Section 4.1 of pymc-bart’s paper [Quiroga et al., 2022]. We use the marketing data set provided by the R package datarium [Kassambara, 2019]. The idea is to model a marketing channel contribution to sales as a function of budget.

Read more ...