Posts in beginner

GLM: Negative Binomial Regression

This notebook closely follows the GLM Poisson regression example by Jonathan Sedar (which is in turn inspired by a project by Ian Osvald) except the data here is negative binomially distributed instead of Poisson distributed.

Read more ...


Splines

Often, the model we want to fit is not a perfect line between some \(x\) and \(y\). Instead, the parameters of the model are expected to vary over \(x\). There are multiple ways to handle this situation, one of which is to fit a spline. The spline is effectively multiple individual lines, each fit to a different section of \(x\), that are tied together at their boundaries, often called knots.

Read more ...


Bayes Factors and Marginal Likelihood

The “Bayesian way” to compare models is to compute the marginal likelihood of each model \(p(y \mid M_k)\), i.e. the probability of the observed data \(y\) given the \(M_k\) model. This quantity, the marginal likelihood, is just the normalizing constant of Bayes’ theorem. We can see this if we write Bayes’ theorem and make explicit the fact that all inferences are model-dependant.

Read more ...


Sampler Statistics

When checking for convergence or when debugging a badly behaving sampler, it is often helpful to take a closer look at what the sampler is doing. For this purpose some samplers export statistics for each generated sample.

Read more ...


General API quickstart

Models in PyMC are centered around the Model class. It has references to all random variables (RVs) and computes the model logp and its gradients. Usually, you would instantiate it as part of a with context:

Read more ...


Approximate Bayesian Computation

Approximate Bayesian Computation methods (also called likelihood free inference methods), are a group of techniques developed for inferring posterior distributions in cases where the likelihood function is intractable or costly to evaluate. This does not mean that the likelihood function is not part of the analysis, it just the we are approximating the likelihood, and hence the name of the ABC methods.

Read more ...


Regression discontinuity design analysis

Quasi experiments involve experimental interventions and quantitative measures. However, quasi-experiments do not involve random assignment of units (e.g. cells, people, companies, schools, states) to test or control groups. This inability to conduct random assignment poses problems when making causal claims as it makes it harder to argue that any difference between a control and test group are because of an intervention and not because of a confounding factor.

Read more ...


Gaussian Mixture Model

A mixture model allows us to make inferences about the component contributors to a distribution of data. More specifically, a Gaussian Mixture Model allows us to make inferences about the means and standard deviations of a specified number of underlying component Gaussian distributions.

Read more ...


Simpson’s paradox and mixed models

This notebook covers:

Read more ...


Bayesian moderation analysis

This notebook covers Bayesian moderation analysis. This is appropriate when we believe that one predictor variable (the moderator) may influence the linear relationship between another predictor variable and an outcome. Here we look at an example where we look at the relationship between hours of training and muscle mass, where it may be that age (the moderating variable) affects this relationship.

Read more ...


Binomial regression

This notebook covers the logic behind Binomial regression, a specific instance of Generalized Linear Modelling. The example is kept very simple, with a single predictor variable.

Read more ...


Bayesian mediation analysis

This notebook covers Bayesian mediation analysis. This is useful when we want to explore possible mediating pathways between a predictor and an outcome variable.

Read more ...


Lasso regression with block updating

Sometimes, it is very useful to update a set of parameters together. For example, variables that are highly correlated are often good to update together. In PyMC block updating is simple. This will be demonstrated using the parameter step of pymc.sample.

Read more ...


Bayesian regression with truncated or censored data

The notebook provides an example of how to conduct linear regression when your outcome variable is either censored or truncated.

Read more ...


Bayesian Estimation Supersedes the T-Test

Non-consecutive header level increase; H1 to H3 [myst.header]

Read more ...


Using shared variables (Data container adaptation)

The pymc.Data container class wraps the theano shared variable class and lets the model be aware of its inputs and outputs. This allows one to change the value of an observed variable to predict or refit on new data. All variables of this class must be declared inside a model context and specify a name for them.

Read more ...


Using a “black box” likelihood function (numpy)

This notebook in part of a set of two twin notebooks that perform the exact same task, this one uses numpy whereas this other one uses Cython

Read more ...


Sequential Monte Carlo

Sampling from distributions with multiple peaks with standard MCMC methods can be difficult, if not impossible, as the Markov chain often gets stuck in either of the minima. A Sequential Monte Carlo sampler (SMC) is a way to ameliorate this problem.

Read more ...


Multivariate Gaussian Random Walk

This notebook shows how to fit a correlated time series using multivariate Gaussian random walks (GRWs). In particular, we perform a Bayesian regression of the time series data against a model dependent on GRWs.

Read more ...