Posts in intermediate

Rolling Regression

Pairs trading is a famous technique in algorithmic trading that plays two stocks against each other.

Read more ...


Probabilistic Matrix Factorization for Making Personalized Recommendations

So you are browsing for something to watch on Netflix and just not liking the suggestions. You just know you can do better. All you need to do is collect some ratings data from yourself and friends and build a recommendation algorithm. This notebook will guide you in doing just that!

Read more ...


Variational Inference: Bayesian Neural Networks

Probabilistic Programming, Deep Learning and “Big Data” are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using Variational Inference. In this example, I will show how to use Variational Inference in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.

Read more ...


Censored Data Models

This example notebook on Bayesian survival analysis touches on the point of censored data. Censoring is a form of missing-data problem, in which observations greater than a certain threshold are clipped down to that threshold, or observations less than a certain threshold are clipped up to that threshold, or both. These are called right, left and interval censoring, respectively. In this example notebook we consider interval censoring.

Read more ...


Gaussian Process for CO2 at Mauna Loa

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

Read more ...


Air passengers - Prophet-like model

We’re going to look at the “air passengers” dataset, which tracks the monthly totals of a US airline passengers from 1949 to 1960. We could fit this using the Prophet model [Taylor and Letham, 2018] (indeed, this dataset is one of the examples they provide in their documentation), but instead we’ll make our own Prophet-like model in PyMC3. This will make it a lot easier to inspect the model’s components and to do prior predictive checks (an integral component of the Bayesian workflow [Gelman et al., 2020]).

Read more ...


NBA Foul Analysis with Item Response Theory

This tutorial shows an application of Bayesian Item Response Theory [Fox, 2010] to NBA basketball foul calls data using PyMC. Based on Austin Rochford’s blogpost NBA Foul Calls and Bayesian Item Response Theory.

Read more ...


Model building and expansion for golf putting

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

Read more ...


A Hierarchical model for Rugby prediction

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

Read more ...


A Primer on Bayesian Methods for Multilevel Modeling

Hierarchical or multilevel modeling is a generalization of regression modeling. Multilevel models are regression models in which the constituent model parameters are given probability models. This implies that model parameters are allowed to vary by group. Observational units are often naturally clustered. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.

Read more ...


GLM: Model Selection

A fairly minimal reproducible example of Model Selection using WAIC, and LOO as currently implemented in PyMC3.

Read more ...


Bayesian Additive Regression Trees: Introduction

Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates \(X\) and we want to use them to model \(Y\), a BART model (omitting the priors) can be represented as:

Read more ...


GLM: Robust Regression using Custom Likelihood for Outlier Classification

top-level ‘substitutions’ key is deprecated, place under ‘myst’ key instead [myst.topmatter]

Read more ...


Hierarchical Binomial Model: Rat Tumor Example

This short tutorial demonstrates how to use PyMC to do inference for the rat tumour example found in chapter 5 of Bayesian Data Analysis 3rd Edition [Gelman et al., 2013]. Readers should already be familiar with the PyMC API.

Read more ...


Estimating parameters of a distribution from awkwardly binned data

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

Read more ...


Hierarchical Partial Pooling

Suppose you are tasked with estimating baseball batting skills for several players. One such performance metric is batting average. Since players play a different number of games and bat in different positions in the order, each player has a different number of at-bats. However, you want to estimate the skill of all players, including those with a relatively small number of batting opportunities.

Read more ...


GLM: Mini-batch ADVI on hierarchical regression model

Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.

Read more ...


Marginalized Gaussian Mixture Model

Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.

Read more ...