Posted in 2024
Bayesian Hypothesis Testing - an introduction
- 06 December 2024
Bayesian hypothesis testing provides a flexible and intuitive way to assess whether parameters differ from specified values. Unlike classical methods focusing on p-values, Bayesian methods let us directly compute probabilities of hypotheses and quantify the strength of evidence in various ways.
GLM-missing-values-in-covariates
- 09 November 2024
Minimal Reproducible Example: Workflow to handle missing data in multiple covariates (numeric predictor features)
GLM-ordinal-features
- 27 October 2024
Here we use an ordinal exogenous predictor feature within a model:
Simpson’s paradox
- 06 September 2024
Simpson’s Paradox describes a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the Simpson’s Paradox Wikipedia page) demonstrates this very nicely.
Confirmatory Factor Analysis and Structural Equation Models in Psychometrics
- 06 September 2024
“Evidently, the notions of relevance and dependence are far more basic to human reasoning than the numerical values attached to probability judgments…the language used for representing probabilistic information should allow assertions about dependency relationships to be expressed qualitatively, directly, and explicitly” - Pearl in Probabilistic Reasoning in Intelligent Systems
The prevalence of malaria in the Gambia
- 24 August 2024
Duplicate implicit target name: “the prevalence of malaria in the gambia”.
Model Averaging
- 06 August 2024
When confronted with more than one model we have several options. One of them is to perform model selection as exemplified by the PyMC examples Model comparison and the GLM: Model Selection, usually is a good idea to also include posterior predictive checks in order to decide which model to keep. Discarding all models except one is equivalent to affirm that, among the evaluated models, one is correct (under some criteria) with probability 1 and the rest are incorrect. In most cases this will be an overstatement that ignores the uncertainty we have in our models. This is somewhat similar to computing the full posterior and then just keeping a point-estimate like the posterior mean; we may become overconfident of what we really know. You can also browse the blog/tag/model-comparison tag to find related posts.
Gaussian Processes: HSGP Advanced Usage
- 28 June 2024
The Hilbert Space Gaussian processes approximation is a low-rank GP approximation that is particularly well-suited to usage in probabilistic programming languages like PyMC. It approximates the GP using a pre-computed and fixed set of basis functions that don’t depend on the form of the covariance kernel or its hyperparameters. It’s a parametric approximation, so prediction in PyMC can be done as one would with a linear model via pm.Data or pm.set_data. You don’t need to define the .conditional distribution that non-parameteric GPs rely on. This makes it much easier to integrate an HSGP, instead of a GP, into your existing PyMC model. Additionally, unlike many other GP approximations, HSGPs can be used anywhere within a model and with any likelihood function.
Gaussian Processes: HSGP Reference & First Steps
- 10 June 2024
The Hilbert Space Gaussian processes approximation is a low-rank GP approximation that is particularly well-suited to usage in probabilistic programming languages like PyMC. It approximates the GP using a pre-computed and fixed set of basis functions that don’t depend on the form of the covariance kernel or its hyperparameters. It’s a parametric approximation, so prediction in PyMC can be done as one would with a linear model via pm.Data or pm.set_data. You don’t need to define the .conditional distribution that non-parameteric GPs rely on. This makes it much easier to integrate an HSGP, instead of a GP, into your existing PyMC model. Additionally, unlike many other GP approximations, HSGPs can be used anywhere within a model and with any likelihood function.
Categorical regression
- 06 May 2024
In this example, we will model outcomes with more than two categories.
Automatic marginalization of discrete variables
- 20 January 2024
PyMC is very amendable to sampling models with discrete latent variables. But if you insist on using the NUTS sampler exclusively, you will need to get rid of your discrete variables somehow. The best way to do this is by marginalizing them out, as then you benefit from Rao-Blackwell’s theorem and get a lower variance estimate of your parameters.
Categories and Curves
- 14 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
The Garden of Forking Data
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Social Networks
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Ordered Categories
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Multilevel Models
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Multilevel Adventures
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Modeling Events
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Missing Data
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Measurement and Misclassification
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Markov Chain Monte Carlo
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Horoscopes
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Good & Bad Controls
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Geocentric Models
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Generalized Linear Madness
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Gaussian Processes
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Fitting Over & Under
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Elemental Confounds
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Counts and Hidden Confounds
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Correlated Features
- 07 January 2024
This notebook is part of the PyMC port of the Statistical Rethinking 2023 lecture series by Richard McElreath.
Bayesian Non-parametric Causal Inference
- 06 January 2024
There are few claims stronger than the assertion of a causal relationship and few claims more contestable. A naive world model - rich with tenuous connections and non-sequiter implications is characteristic of conspiracy theory and idiocy. On the other hand, a refined and detailed knowledge of cause and effect characterised by clear expectations, plausible connections and compelling counterfactuals, will steer you well through the buzzing, blooming confusion of the world.
Baby Births Modelling with HSGPs
- 06 January 2024
This notebook provides an example of using the Hilbert Space Gaussian Process (HSGP) technique, introduced in [], in the context of time series modeling. This technique has proven successful in speeding up models with Gaussian process components.