Posts by Benjamin T. Vincent

Bayesian copula estimation: Describing correlated joint distributions

When we deal with multiple variables (e.g. \(a\) and \(b\)) we often want to describe the joint distribution \(P(a, b)\) parametrically. If we are lucky, then this joint distribution might be ‘simple’ in some way. For example, it could be that \(a\) and \(b\) are statistically independent, in which case we can break down the joint distribution into \(P(a, b) = P(a) P(b)\) and so we just need to find appropriate parametric descriptions for \(P(a)\) and \(P(b)\). Even if this is not appropriate, it may be that \(P(a, b)\) could be described well by a simple multivariate distribution, such as a multivariate normal distribution for example.

Read more ...


Interventional distributions and graph mutation with the do-operator

PyMC is a pivotal component of the open source Bayesian statistics ecosystem. It helps solve real problems across a wide range of industries and academic research areas every day. And it has gained this level of utility by being accessible, powerful, and practically useful at solving Bayesian statistical inference problems.

Read more ...


Interrupted time series analysis

This notebook focuses on how to conduct a simple Bayesian interrupted time series analysis. This is useful in quasi-experimental settings where an intervention was applied to all treatment units.

Read more ...


Difference in differences

This notebook provides a brief overview of the difference in differences approach to causal inference, and shows a working example of how to conduct this type of analysis under the Bayesian framework, using PyMC. While the notebooks provides a high level overview of the approach, I recommend consulting two excellent textbooks on causal inference. Both The Effect [Huntington-Klein, 2021] and Causal Inference: The Mixtape [Cunningham, 2021] have chapters devoted to difference in differences.

Read more ...


Bayesian regression with truncated or censored data

The notebook provides an example of how to conduct linear regression when your outcome variable is either censored or truncated.

Read more ...


Counterfactual inference: calculating excess deaths due to COVID-19

Causal reasoning and counterfactual thinking are really interesting but complex topics! Nevertheless, we can make headway into understanding the ideas through relatively simple examples. This notebook focuses on the concepts and the practical implementation of Bayesian causal reasoning using PyMC.

Read more ...


Regression discontinuity design analysis

Quasi experiments involve experimental interventions and quantitative measures. However, quasi-experiments do not involve random assignment of units (e.g. cells, people, companies, schools, states) to test or control groups. This inability to conduct random assignment poses problems when making causal claims as it makes it harder to argue that any difference between a control and test group are because of an intervention and not because of a confounding factor.

Read more ...


Simpson’s paradox and mixed models

This notebook covers:

Read more ...


Bayesian moderation analysis

This notebook covers Bayesian moderation analysis. This is appropriate when we believe that one predictor variable (the moderator) may influence the linear relationship between another predictor variable and an outcome. Here we look at an example where we look at the relationship between hours of training and muscle mass, where it may be that age (the moderating variable) affects this relationship.

Read more ...


Binomial regression

This notebook covers the logic behind Binomial regression, a specific instance of Generalized Linear Modelling. The example is kept very simple, with a single predictor variable.

Read more ...


Bayesian mediation analysis

This notebook covers Bayesian mediation analysis. This is useful when we want to explore possible mediating pathways between a predictor and an outcome variable.

Read more ...


Estimating parameters of a distribution from awkwardly binned data

Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.

Read more ...