Posted in 2024

Simpson’s paradox

Simpson’s Paradox describes a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the Simpson’s Paradox Wikipedia page) demonstrates this very nicely.

Read more ...


Confirmatory Factor Analysis and Structural Equation Models in Psychometrics

“Evidently, the notions of relevance and dependence are far more basic to human reasoning than the numerical values attached to probability judgments…the language used for representing probabilistic information should allow assertions about dependency relationships to be expressed qualitatively, directly, and explicitly” - Pearl in Probabilistic Reasoning in Intelligent Systems Pearl [1985]

Read more ...


The prevalence of malaria in the Gambia

Duplicate implicit target name: “the prevalence of malaria in the gambia”.

Read more ...


Model Averaging

When confronted with more than one model we have several options. One of them is to perform model selection as exemplified by the PyMC examples Model comparison and the GLM: Model Selection, usually is a good idea to also include posterior predictive checks in order to decide which model to keep. Discarding all models except one is equivalent to affirm that, among the evaluated models, one is correct (under some criteria) with probability 1 and the rest are incorrect. In most cases this will be an overstatment that ignores the uncertainty we have in our models. This is somewhat similar to computing the full posterior and then just keeping a point-estimate like the posterior mean; we may become overconfident of what we really know. You can also browse the blog/tag/model-comparison tag to find related posts.

Read more ...


Time Series Models Derived From a Generative Graph

In this notebook, we show how to model and fit a time series model starting from a generative graph. In particular, we explain how to use scan to loop efficiently inside a PyMC model.

Read more ...


Gaussian Processes: HSGP Advanced Usage

The Hilbert Space Gaussian processes approximation is a low-rank GP approximation that is particularly well-suited to usage in probabilistic programming languages like PyMC. It approximates the GP using a pre-computed and fixed set of basis functions that don’t depend on the form of the covariance kernel or its hyperparameters. It’s a parametric approximation, so prediction in PyMC can be done as one would with a linear model via pm.Data or pm.set_data. You don’t need to define the .conditional distribution that non-parameteric GPs rely on. This makes it much easier to integrate an HSGP, instead of a GP, into your existing PyMC model. Additionally, unlike many other GP approximations, HSGPs can be used anywhere within a model and with any likelihood function.

Read more ...


Gaussian Processes: HSGP Reference & First Steps

The Hilbert Space Gaussian processes approximation is a low-rank GP approximation that is particularly well-suited to usage in probabilistic programming languages like PyMC. It approximates the GP using a pre-computed and fixed set of basis functions that don’t depend on the form of the covariance kernel or its hyperparameters. It’s a parametric approximation, so prediction in PyMC can be done as one would with a linear model via pm.Data or pm.set_data. You don’t need to define the .conditional distribution that non-parameteric GPs rely on. This makes it much easier to integrate an HSGP, instead of a GP, into your existing PyMC model. Additionally, unlike many other GP approximations, HSGPs can be used anywhere within a model and with any likelihood function.

Read more ...


Categorical regression

In this example, we will model outcomes with more than two categories.

Read more ...


Automatic marginalization of discrete variables

PyMC is very amendable to sampling models with discrete latent variables. But if you insist on using the NUTS sampler exclusively, you will need to get rid of your discrete variables somehow. The best way to do this is by marginalizing them out, as then you benefit from Rao-Blackwell’s theorem and get a lower variance estimate of your parameters.

Read more ...


Bayesian Non-parametric Causal Inference

There are few claims stronger than the assertion of a causal relationship and few claims more contestable. A naive world model - rich with tenuous connections and non-sequiter implications is characteristic of conspiracy theory and idiocy. On the other hand, a refined and detailed knowledge of cause and effect characterised by clear expectations, plausible connections and compelling counterfactuals, will steer you well through the buzzing, blooming confusion of the world.

Read more ...


Baby Births Modelling with HSGPs

This notebook provides an example of using the Hilbert Space Gaussian Process (HSGP) technique, introduced in [Solin and Särkkä, 2020], in the context of time series modeling. This technique has proven successful in speeding up models with Gaussian process components.

Read more ...