# Posted in 2021

## Bayesian Additive Regression Trees: Introduction

Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates $$X$$ and we want to use them to model $$Y$$, a BART model (omitting the priors) can be represented as:

## Using a “black box” likelihood function (numpy)

This notebook in part of a set of two twin notebooks that perform the exact same task, this one uses numpy whereas this other one uses Cython

## Using Data Containers

After building the statistical model of your dreams, you’re going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called X in linear regression models, where mu = X @ beta. Other data are “observed” examples of the endogenous outputs of your model, called y in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy ndarrays, pandas Series and DataFrame, and even pytensor TensorVariables.

## GLM: Robust Regression using Custom Likelihood for Outlier Classification

Using PyMC for Robust Regression with Outlier Detection using the Hogg 2010 Signal vs Noise method.

## Estimating parameters of a distribution from awkwardly binned data

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

## Sequential Monte Carlo

• 19 October 2021

Sampling from distributions with multiple peaks with standard MCMC methods can be difficult, if not impossible, as the Markov chain often gets stuck in either of the minima. A Sequential Monte Carlo sampler (SMC) is a way to ameliorate this problem.

## GLM: Mini-batch ADVI on hierarchical regression model

Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.

## Marginalized Gaussian Mixture Model

• 18 September 2021

Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.

## Dirichlet process mixtures for density estimation

The Dirichlet process is a flexible probability distribution over the space of distributions. Most generally, a probability distribution, $$P$$, on a set $$\Omega$$ is a [measure](https://en.wikipedia.org/wiki/Measure_(mathematics%29) that assigns measure one to the entire space ($$P(\Omega) = 1$$). A Dirichlet process $$P \sim \textrm{DP}(\alpha, P_0)$$ is a measure that has the property that, for every finite disjoint partition $$S_1, \ldots, S_n$$ of $$\Omega$$,