# Posts in intermediate

## Kronecker Structured Covariances

- 20 October 2022
- Category: intermediate

PyMC contains implementations for models that have Kronecker structured covariances. This patterned structure enables Gaussian process models to work on much larger datasets. Kronecker structure can be exploited when

## Interrupted time series analysis

- 20 October 2022
- Category: intermediate

This notebook focuses on how to conduct a simple Bayesian interrupted time series analysis. This is useful in quasi-experimental settings where an intervention was applied to all treatment units.

## Forecasting with Structural AR Timeseries

- 20 October 2022
- Category: intermediate

Bayesian structural timeseries models are an interesting way to learn about the structure inherent in any observed timeseries data. It also gives us the ability to project forward the implied predictive distribution granting us another view on forecasting problems. We can treat the learned characteristics of the timeseries data observed to-date as informative about the structure of the unrealised future state of the same measure.

## Gaussian Processes: Latent Variable Implementation

- 28 September 2022
- Category: reference, intermediate

The `gp.Latent`

class is a direct implementation of a Gaussian process without approximation. Given a mean and covariance function, we can place a prior on the function \(f(x)\),

## Difference in differences

- 20 September 2022
- Category: intermediate

This notebook provides a brief overview of the difference in differences approach to causal inference, and shows a working example of how to conduct this type of analysis under the Bayesian framework, using PyMC. While the notebooks provides a high level overview of the approach, I recommend consulting two excellent textbooks on causal inference. Both The Effect [Huntington-Klein, 2021] and Causal Inference: The Mixtape [Cunningham, 2021] have chapters devoted to difference in differences.

## Model Averaging

- 20 August 2022
- Category: intermediate

When confronted with more than one model we have several options. One of them is to perform model selection, using for example a given Information Criterion as exemplified the PyMC examples Model comparison and the GLM: Model Selection. Model selection is appealing for its simplicity, but we are discarding information about the uncertainty in our models. This is somehow similar to computing the full posterior and then just keep a point-estimate like the posterior mean; we may become overconfident of what we really know. You can also browse the blog/tag/model-comparison tag to find related posts.

## Counterfactual inference: calculating excess deaths due to COVID-19

- 20 July 2022
- Category: intermediate

Causal reasoning and counterfactual thinking are really interesting but complex topics! Nevertheless, we can make headway into understanding the ideas through relatively simple examples. This notebook focuses on the concepts and the practical implementation of Bayesian causal reasoning using PyMC.

## Rolling Regression

- 20 June 2022
- Category: intermediate

Pairs trading is a famous technique in algorithmic trading that plays two stocks against each other.

## Probabilistic Matrix Factorization for Making Personalized Recommendations

- 03 June 2022
- Category: intermediate

So you are browsing for something to watch on Netflix and just not liking the suggestions. You just know you can do better. All you need to do is collect some ratings data from yourself and friends and build a recommendation algorithm. This notebook will guide you in doing just that!

## Modeling spatial point patterns with a marked log-Gaussian Cox process

- 31 May 2022
- Category: intermediate

The log-Gaussian Cox process (LGCP) is a probabilistic model of point patterns typically observed in space or time. It has two main components. First, an underlying *intensity* field \(\lambda(s)\) of positive real values is modeled over the entire domain \(X\) using an exponentially-transformed Gaussian process which constrains \(\lambda\) to be positive. Then, this intensity field is used to parameterize a Poisson point process which represents a stochastic mechanism for placing points in space. Some phenomena amenable to this representation include the incidence of cancer cases across a county, or the spatiotemporal locations of crime events in a city. Both spatial and temporal dimensions can be handled equivalently within this framework, though this tutorial only addresses data in two spatial dimensions.

## Variational Inference: Bayesian Neural Networks

- 30 May 2022
- Category: intermediate

**Probabilistic Programming**, **Deep Learning** and “**Big Data**” are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.

## GLM: Hierarchical Linear Regression

- 20 May 2022
- Category: intermediate

(c) 2016 by Danne Elbers, Thomas Wiecki

## Censored Data Models

- 20 May 2022
- Category: intermediate, how-to

This example notebook on Bayesian survival
analysis touches on the
point of censored data. *Censoring* is a form of missing-data problem, in which
observations greater than a certain threshold are clipped down to that
threshold, or observations less than a certain threshold are clipped up to that
threshold, or both. These are called right, left and interval censoring,
respectively. In this example notebook we consider interval censoring.

## Gaussian Process for CO2 at Mauna Loa

- 20 April 2022
- Category: intermediate

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

## Air passengers - Prophet-like model

- 20 April 2022
- Category: intermediate

We’re going to look at the “air passengers” dataset, which tracks the monthly totals of a US airline passengers from 1949 to 1960. We could fit this using the Prophet model [Taylor and Letham, 2018] (indeed, this dataset is one of the examples they provide in their documentation), but instead we’ll make our own Prophet-like model in PyMC3. This will make it a lot easier to inspect the model’s components and to do prior predictive checks (an integral component of the Bayesian workflow [Gelman *et al.*, 2020]).

## NBA Foul Analysis with Item Response Theory

- 17 April 2022
- Category: intermediate, tutorial

This tutorial shows an application of Bayesian Item Response Theory [Fox, 2010] to NBA basketball foul calls data using PyMC. Based on Austin Rochford’s blogpost NBA Foul Calls and Bayesian Item Response Theory.

## Model building and expansion for golf putting

- 02 April 2022
- Category: intermediate, how-to

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

## Mean and Covariance Functions

- 22 March 2022
- Category: intermediate, reference

A large set of mean and covariance functions are available in PyMC. It is relatively easy to define custom mean and covariance functions. Since PyMC uses Aesara, their gradients do not need to be defined by the user.

## A Hierarchical model for Rugby prediction

- 19 March 2022
- Category: intermediate, how-to

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

## A Primer on Bayesian Methods for Multilevel Modeling

- 27 February 2022
- Category: intermediate

Hierarchical or multilevel modeling is a generalization of regression modeling. *Multilevel models* are regression models in which the constituent model parameters are given **probability models**. This implies that model parameters are allowed to **vary by group**. Observational units are often naturally **clustered**. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.

## GLM: Model Selection

- 08 January 2022
- Category: intermediate

A fairly minimal reproducible example of Model Selection using WAIC, and LOO as currently implemented in PyMC3.

## Bayesian Additive Regression Trees: Introduction

- 21 December 2021
- Category: intermediate, explanation

Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates \(X\) and we want to use them to model \(Y\), a BART model (omitting the priors) can be represented as:

## GLM: Robust Regression using Custom Likelihood for Outlier Classification

- 17 November 2021
- Category: intermediate

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

## Hierarchical Binomial Model: Rat Tumor Example

- 11 November 2021
- Category: intermediate

This short tutorial demonstrates how to use PyMC to do inference for the rat tumour example found in chapter 5 of *Bayesian Data Analysis 3rd Edition* [Gelman *et al.*, 2013]. Readers should already be familiar with the PyMC API.

## Estimating parameters of a distribution from awkwardly binned data

- 23 October 2021
- Category: intermediate

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

## Hierarchical Partial Pooling

- 07 October 2021
- Category: intermediate

Suppose you are tasked with estimating baseball batting skills for several players. One such performance metric is batting average. Since players play a different number of games and bat in different positions in the order, each player has a different number of at-bats. However, you want to estimate the skill of all players, including those with a relatively small number of batting opportunities.

## GLM: Mini-batch ADVI on hierarchical regression model

- 23 September 2021
- Category: intermediate

Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.

## Marginalized Gaussian Mixture Model

- 18 September 2021
- Category: intermediate

Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.

## Diagnosing Biased Inference with Divergences

- 20 February 2018
- Category: intermediate

This notebook is a PyMC3 port of Michael Betancourt’s post on mc-stan. For detailed explanation of the underlying mechanism please check the original post, Diagnosing Biased Inference with Divergences and Betancourt’s excellent paper, A Conceptual Introduction to Hamiltonian Monte Carlo.