Posted in 2023
Out-Of-Sample Predictions
- 04 December 2023
We want to fit a logistic regression model where there is a multiplicative interaction between two numerical features.
Bayesian copula estimation: Describing correlated joint distributions
- 04 December 2023
When we deal with multiple variables (e.g. \(a\) and \(b\)) we often want to describe the joint distribution \(P(a, b)\) parametrically. If we are lucky, then this joint distribution might be ‘simple’ in some way. For example, it could be that \(a\) and \(b\) are statistically independent, in which case we can break down the joint distribution into \(P(a, b) = P(a) P(b)\) and so we just need to find appropriate parametric descriptions for \(P(a)\) and \(P(b)\). Even if this is not appropriate, it may be that \(P(a, b)\) could be described well by a simple multivariate distribution, such as a multivariate normal distribution for example.
Frailty and Survival Regression Models
- 04 November 2023
This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.
GLM: Negative Binomial Regression
- 04 September 2023
This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.
The Besag-York-Mollie Model for Spatial Data
- 18 August 2023
This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.
Faster Sampling with JAX and Numba
- 11 July 2023
PyMC can compile its models to various execution backends through PyTensor, including:
Interventional distributions and graph mutation with the do-operator
- 04 July 2023
PyMC is a pivotal component of the open source Bayesian statistics ecosystem. It helps solve real problems across a wide range of industries and academic research areas every day. And it has gained this level of utility by being accessible, powerful, and practically useful at solving Bayesian statistical inference problems.
Gaussian Processes: Latent Variable Implementation
- 06 June 2023
The gp.Latent
class is a direct implementation of a Gaussian process without approximation. Given a mean and covariance function, we can place a prior on the function \(f(x)\),
Marginal Likelihood Implementation
- 04 June 2023
The gp.Marginal
class implements the more common case of GP regression: the observed data are the sum of a GP and Gaussian noise. gp.Marginal
has a marginal_likelihood
method, a conditional
method, and a predict
method. Given a mean and covariance function, the function \(f(x)\) is modeled as,
Discrete Choice and Random Utility Models
- 04 June 2023
This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.
Regression Models with Ordered Categorical Outcomes
- 04 April 2023
Like many areas of statistics the language of survey data comes with an overloaded vocabulary. When discussing survey design you will often hear about the contrast between design based and model based approaches to (i) sampling strategies and (ii) statistical inference on the associated data. We won’t wade into the details about different sample strategies such as: simple random sampling, cluster random sampling or stratified random sampling using population weighting schemes. The literature on each of these is vast, but in this notebook we’ll talk about when any why it’s useful to apply model driven statistical inference to Likert scaled survey response data and other kinds of ordered categorical data.
Longitudinal Models of Change
- 04 April 2023
The study of change involves simultaneously analysing the individual trajectories of change and abstracting over the set of individuals studied to extract broader insight about the nature of the change in question. As such it’s easy to lose sight of the forest for the focus on the trees. In this example we’ll demonstrate some of the subtleties of using hierarchical bayesian models to study the change within a population of individuals - moving from the within individual view to the between/cross individuals perspective.
Using ModelBuilder class for deploying PyMC models
- 22 February 2023
Many users face difficulty in deploying their PyMC models to production because deploying/saving/loading a user-created model is not well standardized. One of the reasons behind this is there is no direct way to save or load a model in PyMC like scikit-learn or TensorFlow. The new ModelBuilder
class is aimed to improve this workflow by providing a scikit-learn inspired API to wrap your PyMC models.
Pathfinder Variational Inference
- 05 February 2023
Pathfinder [Zhang et al., 2021] is a variational inference algorithm that produces samples from the posterior of a Bayesian model. It compares favorably to the widely used ADVI algorithm. On large problems, it should scale better than most MCMC algorithms, including dynamic HMC (i.e. NUTS), at the cost of a more biased estimate of the posterior. For details on the algorithm, see the arxiv preprint.
Bayesian Missing Data Imputation
- 04 February 2023
Duplicate implicit target name: “bayesian missing data imputation”.
Multivariate Gaussian Random Walk
- 02 February 2023
This notebook shows how to fit a correlated time series using multivariate Gaussian random walks (GRWs). In particular, we perform a Bayesian regression of the time series data against a model dependent on GRWs.
Rolling Regression
- 28 January 2023
Pairs trading is a famous technique in algorithmic trading that plays two stocks against each other.
Hierarchical Partial Pooling
- 28 January 2023
Suppose you are tasked with estimating baseball batting skills for several players. One such performance metric is batting average. Since players play a different number of games and bat in different positions in the order, each player has a different number of at-bats. However, you want to estimate the skill of all players, including those with a relatively small number of batting opportunities.
Quantile Regression with BART
- 25 January 2023
Usually when doing regression we model the conditional mean of some distribution. Common cases are a Normal distribution for continuous unbounded responses, a Poisson distribution for count data, etc.
DEMetropolis(Z) Sampler Tuning
- 18 January 2023
For continuous variables, the default PyMC sampler (NUTS
) requires that gradients are computed, which PyMC does through autodifferentiation. However, in some cases, a PyMC model may not be supplied with gradients (for example, by evaluating a numerical model outside of PyMC) and an alternative sampler is necessary. The DEMetropolisZ
sampler is an efficient choice for gradient-free inference. The implementation of DEMetropolisZ
in PyMC is based on ter Braak and Vrugt [2008] but with a modified tuning scheme. This notebook compares various tuning parameter settings for the sampler, including the drop_tune_fraction
parameter which was introduced in PyMC.
DEMetropolis and DEMetropolis(Z) Algorithm Comparisons
- 18 January 2023
For continuous variables, the default PyMC sampler (NUTS
) requires that gradients are computed, which PyMC does through autodifferentiation. However, in some cases, a PyMC model may not be supplied with gradients (for example, by evaluating a numerical model outside of PyMC) and an alternative sampler is necessary. Differential evolution (DE) Metropolis samplers are an efficient choice for gradient-free inference. This notebook compares the DEMetropolis
and the DEMetropolisZ
samplers in PyMC to help determine which is a better option for a given problem.
Reparameterizing the Weibull Accelerated Failure Time Model
- 17 January 2023
The previous example notebook on Bayesian parametric survival analysis introduced two different accelerated failure time (AFT) models: Weibull and log-linear. In this notebook, we present three different parameterizations of the Weibull AFT model.
Bayesian Survival Analysis
- 17 January 2023
Survival analysis studies the distribution of the time to an event. Its applications span many fields across medicine, biology, engineering, and social science. This tutorial shows how to fit and analyze a Bayesian survival model in Python using PyMC.
ODE Lotka-Volterra With Bayesian Inference in Multiple Ways
- 16 January 2023
The purpose of this notebook is to demonstrate how to perform Bayesian inference on a system of ordinary differential equations (ODEs), both with and without gradients. The accuracy and efficiency of different samplers are compared.
Introduction to Variational Inference with PyMC
- 13 January 2023
The most common strategy for computing posterior quantities of Bayesian models is via sampling, particularly Markov chain Monte Carlo (MCMC) algorithms. While sampling algorithms and associated computing have continually improved in performance and efficiency, MCMC methods still scale poorly with data size, and become prohibitive for more than a few thousand observations. A more scalable alternative to sampling is variational inference (VI), which re-frames the problem of computing the posterior distribution as an optimization problem.
Empirical Approximation overview
- 13 January 2023
For most models we use sampling MCMC algorithms like Metropolis or NUTS. In PyMC we got used to store traces of MCMC samples and then do analysis using them. There is a similar concept for the variational inference submodule in PyMC: Empirical. This type of approximation stores particles for the SVGD sampler. There is no difference between independent SVGD particles and MCMC samples. Empirical acts as a bridge between MCMC sampling output and full-fledged VI utils like apply_replacements
or sample_node
. For the interface description, see variational_api_quickstart. Here we will just focus on Emprical
and give an overview of specific things for the Empirical approximation.
Hierarchical Binomial Model: Rat Tumor Example
- 10 January 2023
This short tutorial demonstrates how to use PyMC to do inference for the rat tumour example found in chapter 5 of Bayesian Data Analysis 3rd Edition [Gelman et al., 2013]. Readers should already be familiar with the PyMC API.
GLM: Robust Linear Regression
- 10 January 2023
Duplicate implicit target name: “glm: robust linear regression”.
Bayes Factors and Marginal Likelihood
- 10 January 2023
The “Bayesian way” to compare models is to compute the marginal likelihood of each model \(p(y \mid M_k)\), i.e. the probability of the observed data \(y\) given the \(M_k\) model. This quantity, the marginal likelihood, is just the normalizing constant of Bayes’ theorem. We can see this if we write Bayes’ theorem and make explicit the fact that all inferences are model-dependant.
Analysis of An AR(1) Model in PyMC
- 07 January 2023
Consider the following AR(2) process, initialized in the infinite past:
Reliability Statistics and Predictive Calibration
- 04 January 2023
Duplicate implicit target name: “reliability statistics and predictive calibration”.
Modeling Heteroscedasticity with BART
- 04 January 2023
In this notebook we show how to use BART to model heteroscedasticity as described in Section 4.1 of pymc-bart
’s paper [Quiroga et al., 2022]. We use the marketing
data set provided by the R package datarium
[Kassambara, 2019]. The idea is to model a marketing channel contribution to sales as a function of budget.