Posted in 2021

Bayesian Additive Regression Trees: Introduction

21 December 2021

Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates \(X\) and we want to use them to model \(Y\), a BART model (omitting the priors) can be represented as:

Read more ...

Using a “black box” likelihood function

16 December 2021

There is a related example that discusses how to use a likelihood implemented in JAX

Read more ...

Using Data Containers

16 December 2021

After building the statistical model of your dreams, you’re going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called X in linear regression models, where mu = X @ beta. Other data are “observed” examples of the endogenous outputs of your model, called y in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy ndarrays, pandas Series and DataFrame, and even pytensor TensorVariables.

Read more ...

GLM: Robust Regression using Custom Likelihood for Outlier Classification

17 November 2021

Using PyMC for Robust Regression with Outlier Detection using the Hogg 2010 Signal vs Noise method.

Read more ...

Estimating parameters of a distribution from awkwardly binned data

23 October 2021

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

Read more ...

Sequential Monte Carlo

19 October 2021

Sampling from distributions with multiple peaks with standard MCMC methods can be difficult, if not impossible, as the Markov chain often gets stuck in either of the minima. A Sequential Monte Carlo sampler (SMC) is a way to ameliorate this problem.

Read more ...

GLM: Mini-batch ADVI on hierarchical regression model

23 September 2021

Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.

Read more ...

Marginalized Gaussian Mixture Model

18 September 2021

Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.

Read more ...

Dirichlet process mixtures for density estimation

16 September 2021

The Dirichlet process is a flexible probability distribution over the space of distributions. Most generally, a probability distribution, \(P\), on a set \(\Omega\) is a [measure](https://en.wikipedia.org/wiki/Measure_(mathematics%29) that assigns measure one to the entire space (\(P(\Omega) = 1\)). A Dirichlet process \(P \sim \textrm{DP}(\alpha, P_0)\) is a measure that has the property that, for every finite disjoint partition \(S_1, \ldots, S_n\) of \(\Omega\),

Read more ...

Introduction to Bayesian A/B Testing

23 May 2021

This notebook demonstrates how to implement a Bayesian analysis of an A/B test. We implement the models discussed in VWO’s Bayesian A/B Testing Whitepaper [Stucchio, 2015], and discuss the effect of different prior choices for these models. This notebook does not discuss other related topics like how to choose a prior, early stopping, and power analysis.

Read more ...

Heteroskedastic Gaussian Processes

05 May 2021

We can typically divide the sources of uncertainty in our models into two categories. “Aleatoric” uncertainty (from the Latin word for dice or randomness) arises from the intrinsic variability of our system. “Epistemic” uncertainty (from the Greek word for knowledge) arises from how our observations are placed throughout the domain of interest.

Read more ...

Categories

Tags

Posted in 2021

Bayesian Additive Regression Trees: Introduction

Using a “black box” likelihood function

Using Data Containers

GLM: Robust Regression using Custom Likelihood for Outlier Classification

Estimating parameters of a distribution from awkwardly binned data

Sequential Monte Carlo

GLM: Mini-batch ADVI on hierarchical regression model

Marginalized Gaussian Mixture Model

Dirichlet process mixtures for density estimation

Introduction to Bayesian A/B Testing

Heteroskedastic Gaussian Processes