# Posted in 2021

## Bayesian Additive Regression Trees: Introduction

- 21 December 2021

Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates \(X\) and we want to use them to model \(Y\), a BART model (omitting the priors) can be represented as:

## Using a “black box” likelihood function (numpy)

- 16 December 2021

This notebook in part of a set of two twin notebooks that perform the exact same task, this one uses numpy whereas this other one uses Cython

## Using Data Containers

- 16 December 2021

After building the statistical model of your dreams, you’re going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X`

in linear regression models, where `mu = X @ beta`

. Other data are “observed” examples of the endogenous outputs of your model, called `y`

in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`

, pandas `Series`

and `DataFrame`

, and even pytensor `TensorVariables`

.

## GLM: Robust Regression using Custom Likelihood for Outlier Classification

- 17 November 2021

Using PyMC for Robust Regression with Outlier Detection using the Hogg 2010 Signal vs Noise method.

## Estimating parameters of a distribution from awkwardly binned data

- 23 October 2021

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

## Sequential Monte Carlo

- 19 October 2021

Sampling from distributions with multiple peaks with standard MCMC methods can be difficult, if not impossible, as the Markov chain often gets stuck in either of the minima. A Sequential Monte Carlo sampler (SMC) is a way to ameliorate this problem.

## GLM: Mini-batch ADVI on hierarchical regression model

- 23 September 2021

Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.

## Marginalized Gaussian Mixture Model

- 18 September 2021

Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.

## Dirichlet process mixtures for density estimation

- 16 September 2021

The Dirichlet process is a flexible probability distribution over the space of distributions. Most generally, a probability distribution, \(P\), on a set \(\Omega\) is a [measure](https://en.wikipedia.org/wiki/Measure_(mathematics%29) that assigns measure one to the entire space (\(P(\Omega) = 1\)). A Dirichlet process \(P \sim \textrm{DP}(\alpha, P_0)\) is a measure that has the property that, for every finite disjoint partition \(S_1, \ldots, S_n\) of \(\Omega\),

## Introduction to Bayesian A/B Testing

- 23 May 2021

This notebook demonstrates how to implement a Bayesian analysis of an A/B test. We implement the models discussed in VWO’s Bayesian A/B Testing Whitepaper [Stucchio, 2015], and discuss the effect of different prior choices for these models. This notebook does *not* discuss other related topics like how to choose a prior, early stopping, and power analysis.