Probabilistic Python: An Introduction to Bayesian Modeling with PyMC#

PyData London 2022

Introduction#

Bayesian statistical methods offer a powerful set of tools to tackle a wide variety of data science problems. In addition, the Bayesian approach generates results that are easy to interpret and automatically account for uncertainty in quantities that we wish to estimate and predict. Historically, computational challenges have been a barrier, particularly to new users, but there now exists a mature set of probabilistic programming tools that are both capable and easy to learn. We will use the newest release of PyMC (version 4) in this tutorial, but the concepts and approaches that will be taught are portable to any probabilistic programming framework.

This tutorial is intended for practicing and aspiring data scientists and analysts looking to learn how to apply Bayesian statistics and probabilistic programming to their work. It will provide learners with a high-level understanding of Bayesian statistical methods and their potential for use in a variety of applications. They will also gain hands-on experience with applying these methods using PyMC, specifically including the specification, fitting and checking of models applied to a couple of real-world datasets.

As this is an introductory tutorial, no direct experience with PyMC or Bayesian statistics will be required. However, to benefit maximally from the tutorial, learners should have some familiarity with basic statistics (things like regression and estimation) and with core components of the scientific Python stack (e.g. NumPy, pandas and Jupyter).

As the goal of the tutorial is to get new users up and running with Bayesian methods, the content is light on theory and focus on the implementation of models, though some statistical background will be provided for context and clarity. Since PyMC is a high-level statistical package, it is easy to gloss over important details of the underlying algorihtms. Therefore, the tutorial begins by solving a simple model using only NumPy and SciPy functions before diving into PyMC. As a capstone to the tutorial, learners will be introduced to “The Bayesian Workflow” to reiterate the important steps in the process, along with useful tips and tricks.

About the Speaker#

Chris is the Principal Quantitative Analyst in Baseball Research & Development for the Philadelphia Phillies. He is interested in computational statistics, machine learning, Bayesian methods, and applied decision analysis. He hails from Vancouver, Canada and received his Ph.D. from the University of Georgia.

Video#



Timestamps#

00:00 Welcome!

0:08 Introduction

1:19 Probabilistic programming

1:53 Stochastic language ”primitives”

3:06 Bayesian inference

3:27 What is Bayes?

3:57 Inverse probability

4:21 Stochastic programs

4:39 Why Bayes

5:13 The Bayes formula

6:51 Prior distribution

8:12 Likelihood function

8:29 Normal distribution

8:53 Binomial distribution

9:14 Poisson distribution

9:32 Infer values for latent variables

9:47 Probabilistic programming abstracts the inference procedure

9:54 Posterior distribution

10:56 Bayes by hand

12:18 Conjugacy

16:43 Probabilistic programming in Python

17:24 PyMC and its features

19:15 Question: Among the different probabilistic programming libraries, is there a difference in what they have to offer?

20:39 Question: How can one know which likelihood distribution to choose?

21:35 Question: Is there a methodology used to specify the likelihood distribution?

22:30 Example: Building models in PyMC

27:31 Stochastic and deterministic variables

37:11 Observed Random Variables

41:00 Question: To what extent are the features of PyMC supported if compiled in different backends?

41:47 Markov Chain Monte Carlo and Bayesian approximation

43:04 Markov chains

44:19 Reversible Markov chains

45:06 Metropolis sampling

48:00 Hamiltonian Monte Carlo

49:10 Hamiltonian dynamics

50:49 No U-turn Sampler (NUTS)

52:11 Question: How do you know the number of leap frog steps to take?

52:55 Example: Markov Chain Monte Carlo in PyMC

1:13:30 Divergences and how to deal with them

1:15:08 Bayesian Fraction of Missing Information

1:16:25 Potential Scale Reduction

1:17:57 Goodness of fit

1:22:40 Intuitive Bayes course

1:23:09 Question: Do bookmakers use PyMC or Bayesian methods?

1:23:53 Question: How does it work if you have different samplers for different variables?

1:25:09 Question: What route should one take in case of data with many discrete variables and many possible values?

1:25:39 Question: Is there a natural way to use PyMC over a cluster of CPUs?

PyMC institutional partners#

PyMC Labs

Intuitive bayes course

Connect with PyMC#

Connect with PyMC via: