Posted in 2021

Bayesian Additive Regression Trees: Introduction

Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates \(X\) and we want to use them to model \(Y\), a BART model (omitting the priors) can be represented as:

Read more ...


Using a “black box” likelihood function (numpy)

This notebook in part of a set of two twin notebooks that perform the exact same task, this one uses numpy whereas this other one uses Cython

Read more ...


Using Data Containers

After building the statistical model of your dreams, you’re going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called X in linear regression models, where mu = X @ beta. Other data are “observed” examples of the endogenous outputs of your model, called y in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy ndarrays, pandas Series and DataFrame, and even pytensor TensorVariables.

Read more ...


GLM: Robust Regression using Custom Likelihood for Outlier Classification

Using PyMC for Robust Regression with Outlier Detection using the Hogg 2010 Signal vs Noise method.

Read more ...


Estimating parameters of a distribution from awkwardly binned data

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

Read more ...


Sequential Monte Carlo

Sampling from distributions with multiple peaks with standard MCMC methods can be difficult, if not impossible, as the Markov chain often gets stuck in either of the minima. A Sequential Monte Carlo sampler (SMC) is a way to ameliorate this problem.

Read more ...


GLM: Mini-batch ADVI on hierarchical regression model

Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.

Read more ...


Marginalized Gaussian Mixture Model

Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.

Read more ...


Dirichlet process mixtures for density estimation

The Dirichlet process is a flexible probability distribution over the space of distributions. Most generally, a probability distribution, \(P\), on a set \(\Omega\) is a [measure](https://en.wikipedia.org/wiki/Measure_(mathematics%29) that assigns measure one to the entire space (\(P(\Omega) = 1\)). A Dirichlet process \(P \sim \textrm{DP}(\alpha, P_0)\) is a measure that has the property that, for every finite disjoint partition \(S_1, \ldots, S_n\) of \(\Omega\),

Read more ...


Introduction to Bayesian A/B Testing

This notebook demonstrates how to implement a Bayesian analysis of an A/B test. We implement the models discussed in VWO’s Bayesian A/B Testing Whitepaper [Stucchio, 2015], and discuss the effect of different prior choices for these models. This notebook does not discuss other related topics like how to choose a prior, early stopping, and power analysis.

Read more ...