# ODE Lotka-Volterra With Bayesian Inference in Multiple Ways#

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor
import pytensor.tensor as pt

from pymc.ode import DifferentialEquation
from pytensor.compile.ops import as_op
from scipy.integrate import odeint
from scipy.optimize import least_squares

print(f"Running on PyMC v{pm.__version__}")

Running on PyMC v5.0.1+5.ga7f361bd

%load_ext watermark
az.style.use("arviz-darkgrid")
rng = np.random.default_rng(1234)


## Purpose#

The purpose of this notebook is to demonstrate how to perform Bayesian inference on a system of ordinary differential equations (ODEs), both with and without gradients. The accuracy and efficiency of different samplers are compared.

We will first present the Lotka-Volterra predator-prey ODE model and example data. Next, we will solve the ODE using scipy.odeint and (non-Bayesian) least squares optimization. Next, we perform Bayesian inference in PyMC using non-gradient-based samplers. Finally, we use gradient-based samplers and compare results.

### Key Conclusions#

Based on the experiments in this notebook, the most simple and efficient method for performing Bayesian inference on the Lotka-Volterra equations was to specify the ODE system in Scipy, wrap the function as a Pytensor op, and use a Differential Evolution Metropolis (DEMetropolis) sampler in PyMC.

## Background#

### Motivation#

Ordinary differential equation models (ODEs) are used in a variety of science and engineering domains to model the time evolution of physical variables. A natural choice to estimate the values and uncertainty of model parameters given experimental data is Bayesian inference. However, ODEs can be challenging to specify and solve in the Bayesian setting, therefore, this notebook steps through multiple methods for solving an ODE inference problem using PyMC. The Lotka-Volterra model used in this example has often been used for benchmarking Bayesian inference methods (e.g., in this Stan case study, and in Chapter 16 of Statistical Rethinking .

### Lotka-Volterra Predator-Prey Model#

The Lotka-Volterra model describes the interaction between a predator and prey species. This ODE given by:

\begin{split} \begin{aligned} \frac{d x}{dt} &=\alpha x -\beta xy \\ \frac{d y}{dt} &=-\gamma y + \delta xy \end{aligned} \end{split}

The state vector $$X(t)=[x(t),y(t)]$$ comprises the densities of the prey and the predator species respectively. Parameters $$\boldsymbol{\theta}=[\alpha,\beta,\gamma,\delta, x(0),y(0)]$$ are the unknowns that we wish to infer from experimental observations. $$x(0), y(0)$$ are the initial values of the states needed to solve the ODE, and $$\alpha,\beta,\gamma$$, and $$\delta$$ are unknown model parameters which represent the following:

• $$\alpha$$ is the growing rate of prey when there’s no predator.

• $$\beta$$ is the dying rate of prey due to predation.

• $$\gamma$$ is the dying rate of predator when there is no prey.

• $$\delta$$ is the growing rate of predator in the presence of prey.

### The Hudson’s Bay Company data#

The Lotka-Volterra predator prey model has been used to successfully explain the dynamics of natural populations of predators and prey, such as the lynx and snowshoe hare data of the Hudson’s Bay Company. Since the dataset is small, we will hand-enter the values.

# fmt: off
data = pd.DataFrame(dict(
year = np.arange(1900., 1921., 1),
lynx = np.array([4.0, 6.1, 9.8, 35.2, 59.4, 41.7, 19.0, 13.0, 8.3, 9.1, 7.4,
8.0, 12.3, 19.5, 45.7, 51.1, 29.7, 15.8, 9.7, 10.1, 8.6]),
hare = np.array([30.0, 47.2, 70.2, 77.4, 36.3, 20.6, 18.1, 21.4, 22.0, 25.4,
27.1, 40.3, 57.0, 76.6, 52.3, 19.5, 11.2, 7.6, 14.6, 16.2, 24.7])))
# fmt: on

year lynx hare
0 1900.0 4.0 30.0
1 1901.0 6.1 47.2
2 1902.0 9.8 70.2
3 1903.0 35.2 77.4
4 1904.0 59.4 36.3
# plot data function for reuse later
def plot_data(ax, lw=2, title="Hudson's Bay Company Data"):
ax.plot(data.year, data.lynx, color="b", lw=lw, marker="o", markersize=12, label="Lynx (Data)")
ax.plot(data.year, data.hare, color="g", lw=lw, marker="+", markersize=14, label="Hare (Data)")
ax.legend(fontsize=14, loc="center left", bbox_to_anchor=(1, 0.5))
ax.set_xlim([1900, 1920])
ax.set_ylim(0)
ax.set_xlabel("Year", fontsize=14)
ax.set_ylabel("Pelts (Thousands)", fontsize=14)
ax.set_xticks(data.year.astype(int))
ax.set_xticklabels(ax.get_xticks(), rotation=45)
ax.set_title(title, fontsize=16)
return ax

fig, ax = plt.subplots(figsize=(7, 4))
plot_data(ax);


### Problem Statement#

The purpose of this analysis is to estimate, with uncertainty, the parameters for the Lotka-Volterra model for the Hudson’s Bay Company data from 1900 to 1920.

## Scipy odeint#

Here, we make a Python function that represents the right-hand-side of the ODE equations with the call signature needed for the odeint function. Note that Scipy’s solve_ivp could also be used, but the older odeint function was faster in speed tests and is therefore used in this notebook.

# define the right hand side of the ODE equations in the Scipy odeint signature
def rhs(X, t, theta):
# unpack parameters
x, y = X
alpha, beta, gamma, delta, xt0, yt0 = theta
# equations
dx_dt = alpha * x - beta * x * y
dy_dt = -gamma * y + delta * x * y
return [dx_dt, dy_dt]


To get a feel for the model and make sure the equations are working correctly, let’s run the model once with reasonable values for $$\theta$$ and plot the results.

# plot model function
def plot_model(
ax,
x_y,
time=np.arange(1900, 1921, 0.01),
alpha=1,
lw=3,
title="Hudson's Bay Company Data and\nExample Model Run",
):

ax.plot(time, x_y[:, 1], color="b", alpha=alpha, lw=lw, label="Lynx (Model)")
ax.plot(time, x_y[:, 0], color="g", alpha=alpha, lw=lw, label="Hare (Model)")
ax.legend(fontsize=14, loc="center left", bbox_to_anchor=(1, 0.5))
ax.set_title(title, fontsize=16)
return ax

# note theta = alpha, beta, gamma, delta, xt0, yt0
theta = [0.52, 0.026, 0.84, 0.026, 34.0, 5.9]
time = np.arange(1900, 1921, 0.01)

# call Scipy's odeint function
x_y = odeint(func=rhs, y0=theta[-2:], t=time, args=(theta,))

# plot
fig, ax = plt.subplots(figsize=(7, 4))
plot_data(ax, lw=0)
plot_model(ax, x_y);


Looks like the odeint function is working as expected.

## Least Squares Solution#

Now, we can solve the ODE using least squares. Make a function that calculates the residual error.

# function that calculates residuals based on a given theta
def ode_model_resid(theta):
return (
data[["hare", "lynx"]] - odeint(func=rhs, y0=theta[-2:], t=data.year, args=(theta,))
).values.flatten()


Feed the residual error function to the Scipy least_squares solver.

# calculate least squares using the Scipy solver
results = least_squares(ode_model_resid, x0=theta)

# put the results in a dataframe for presentation and convenience
df = pd.DataFrame()
parameter_names = ["alpha", "beta", "gamma", "delta", "h0", "l0"]
df["Parameter"] = parameter_names
df["Least Squares Solution"] = results.x
df.round(2)

Parameter Least Squares Solution
0 alpha 0.48
1 beta 0.02
2 gamma 0.93
3 delta 0.03
4 h0 34.91
5 l0 3.86

Plot

time = np.arange(1900, 1921, 0.01)
theta = results.x
x_y = odeint(func=rhs, y0=theta[-2:], t=time, args=(theta,))
fig, ax = plt.subplots(figsize=(7, 4))
plot_data(ax, lw=0)
plot_model(ax, x_y, title="Least Squares Solution");


Looks right. If we didn’t care about uncertainty, then we would be done. But we do care about uncertainty, so let’s move on to Bayesian inference.

## PyMC Model Specification for Gradient-Free Bayesian Inference#

Like other Numpy or Scipy-based functions, the scipy.integrate.odeint function cannot be used directly in a PyMC model because PyMC needs to know the variable input and output types to compile. Therefore, we use a Pytensor wrapper to give the variable types to PyMC. Then the function can be used in PyMC in conjunction with gradient-free samplers.

### Convert Python Function to a Pytensor Operator using @as_op decorator#

We tell PyMC the input variable types and the output variable types using the @as_op decorator. odeint returns Numpy arrays, but we tell PyMC that they are Pytensor double float tensors for this purpose.

# decorator with input and output types a Pytensor double float tensors
@as_op(itypes=[pt.dvector], otypes=[pt.dmatrix])
def pytensor_forward_model_matrix(theta):
return odeint(func=rhs, y0=theta[-2:], t=data.year, args=(theta,))


### PyMC Model#

Now, we can specify the PyMC model using the ode solver! For priors, we will use the results from the least squares calculation (results.x) to assign priors that start in the right range. These are empirically derived weakly informative priors. We also make them positive-only for this problem.

We will use a normal likelihood on untransformed data (i.e., not log transformed) to best fit the peaks of the data.

theta = results.x  # least squares solution used to inform the priors
with pm.Model() as model:
# Priors
alpha = pm.TruncatedNormal("alpha", mu=theta[0], sigma=theta[0], lower=0, initval=theta[0])
beta = pm.TruncatedNormal("beta", mu=theta[1], sigma=theta[1], lower=0, initval=theta[1])
gamma = pm.TruncatedNormal("gamma", mu=theta[2], sigma=theta[2], lower=0, initval=theta[2])
delta = pm.TruncatedNormal("delta", mu=theta[3], sigma=theta[3], lower=0, initval=theta[3])
xt0 = pm.TruncatedNormal("xto", mu=theta[4], sigma=theta[4], lower=0, initval=theta[4])
yt0 = pm.TruncatedNormal("yto", mu=theta[5], sigma=theta[5], lower=0, initval=theta[5])
sigma = pm.TruncatedNormal("sigma", mu=10, sigma=10, lower=0)

# Ode solution function
ode_solution = pytensor_forward_model_matrix(
pm.math.stack([alpha, beta, gamma, delta, xt0, yt0])
)

# Likelihood
pm.Normal("Y_obs", mu=ode_solution, sigma=sigma, observed=data[["hare", "lynx"]].values)

pm.model_to_graphviz(model=model)


### Plotting Functions#

A couple of plotting functions that we will reuse below.

def plot_model_trace(ax, trace_df, row_idx, lw=1, alpha=0.2):
cols = ["alpha", "beta", "gamma", "delta", "xto", "yto"]
row = trace_df.iloc[row_idx, :][cols]

# alpha, beta, gamma, delta, Xt0, Yt0
time = np.arange(1900, 1921, 0.01)
theta = row
x_y = odeint(func=rhs, y0=theta[-2:], t=time, args=(theta,))
plot_model(ax, x_y, time=time, lw=lw, alpha=alpha);

def plot_inference(
ax,
trace,
num_samples=25,
title="Hudson's Bay Company Data and\nInference Model Runs",
plot_model_kwargs=dict(lw=1, alpha=0.2),
):
trace_df = az.extract(trace, num_samples=num_samples).to_dataframe()
plot_data(ax, lw=0)
for row_idx in range(num_samples):
plot_model_trace(ax, trace_df, row_idx, **plot_model_kwargs)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[:2], labels[:2], loc="center left", bbox_to_anchor=(1, 0.5))
ax.set_title(title, fontsize=16)


Having good gradient free samplers can open up the models that can be fit within PyMC. There are five options for gradient-free samplers in PyMC that are applicable to this problem:

• Slice - the default gradient-free sampler

• DEMetropolisZ - a differential evolution Metropolis sampler that uses the past to inform sampling jumps

• DEMetropolis - a differential evolution Metropolis sampler

• Metropolis - the vanilla Metropolis sampler

• SMC - Sequential Monte Carlo

Let’s give them a shot.

A few notes on running these inferences. For each sampler, the number of tuning steps and draws have been reduced to run the inference in a reasonable amount of time (on the order of minutes). This is not a sufficient number of draws to get a good inferences, in some cases, but it works for demonstration purposes. In addition, multicore processing was not working for the Pytensor op function on all machines, so inference is performed on one core.

### Slice Sampler#

# Variable list to give to the sample step parameter
vars_list = list(model.values_to_rvs.keys())[:-1]

# Specify the sampler
sampler = "Slice Sampler"
tune = draws = 500

# Inference!
with model:
trace_slice = pm.sample(step=[pm.Slice(vars_list)], tune=tune, draws=draws, cores=1)
trace = trace_slice
az.summary(trace)

Sequential sampling (2 chains in 1 job)
CompoundStep
>Slice: [alpha]
>Slice: [beta]
>Slice: [gamma]
>Slice: [delta]
>Slice: [xto]
>Slice: [yto]
>Slice: [sigma]

100.00% [1000/1000 04:12<00:00 Sampling chain 0, 0 divergences]
100.00% [1000/1000 04:19<00:00 Sampling chain 1, 0 divergences]
Sampling 2 chains for 500 tune and 500 draw iterations (1_000 + 1_000 draws total) took 511 seconds.

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.473 0.041 0.418 0.563 0.013 0.009 13.0 25.0 1.09
beta 0.025 0.002 0.022 0.029 0.000 0.000 20.0 27.0 1.04
gamma 0.952 0.081 0.792 1.088 0.026 0.019 11.0 30.0 1.11
delta 0.028 0.002 0.023 0.032 0.001 0.001 11.0 31.0 1.13
xto 35.038 1.705 31.835 38.170 0.312 0.223 30.0 51.0 1.02
yto 3.774 0.665 2.697 5.091 0.232 0.174 10.0 29.0 1.15
sigma 4.242 0.526 3.345 5.240 0.021 0.015 647.0 684.0 1.00
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
The Slice sampler was slow and resulted in a low effective sample size. Despite this, the results are starting to look reasonable!

### DE MetropolisZ Sampler#

sampler = "DEMetropolisZ"
tune = draws = 5000
with model:
trace_DEMZ = pm.sample(step=[pm.DEMetropolisZ(vars_list)], tune=tune, draws=draws, cores=1)
trace = trace_DEMZ
az.summary(trace)

Sequential sampling (2 chains in 1 job)
DEMetropolisZ: [alpha, beta, gamma, delta, xto, yto, sigma]

100.00% [10000/10000 01:43<00:00 Sampling chain 0, 0 divergences]
100.00% [10000/10000 02:00<00:00 Sampling chain 1, 0 divergences]
Sampling 2 chains for 5_000 tune and 5_000 draw iterations (10_000 + 10_000 draws total) took 224 seconds.

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.483 0.039 0.410 0.558 0.002 0.002 265.0 421.0 1.00
beta 0.025 0.002 0.022 0.028 0.000 0.000 288.0 425.0 1.00
gamma 0.927 0.080 0.785 1.082 0.005 0.003 247.0 373.0 1.00
delta 0.028 0.002 0.023 0.032 0.000 0.000 242.0 482.0 1.00
xto 34.886 1.660 31.807 37.972 0.104 0.074 257.0 333.0 1.01
yto 3.990 0.647 2.825 5.219 0.046 0.033 201.0 528.0 1.01
sigma 4.223 0.518 3.385 5.182 0.036 0.026 218.0 357.0 1.01
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference\n{sampler} Sampler")


Notes:
DEMetropolisZ sampled much quicker than the Slice sampler and therefore had a higher ESS per minute spent sampling. The parameter estimates are similar. A “final” inference would still need to beef up the number of samples.

### DEMetropolis Sampler#

In these experiments, DEMetropolis sampler was not accepting tune and requiring chains to be at least 8. We will go with 8 chains and truncate the trace following inference to remove the tuning steps (i.e., the “burn-in”).

sampler = "DEMetropolis"
chains = 8
draws = 3000
with model:
trace_DEM = pm.sample(step=[pm.DEMetropolis(vars_list)], draws=draws, chains=chains, cores=1)
trace = trace_DEM
az.summary(trace)

Population sampling (8 chains)
DEMetropolis: [alpha, beta, gamma, delta, xto, yto, sigma]
Chains are not parallelized. You can enable this by passing pm.sample(cores=n), where n > 1.

100.00% [4000/4000 07:01<00:00]
Sampling 8 chains for 0 tune and 4_000 draw iterations (0 + 32_000 draws total) took 424 seconds.

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.510 0.091 0.404 0.615 0.008 0.006 151.0 143.0 1.03
beta 0.026 0.003 0.022 0.033 0.000 0.000 78.0 149.0 1.07
gamma 0.898 0.105 0.698 1.090 0.008 0.006 232.0 147.0 1.02
delta 0.027 0.003 0.021 0.032 0.000 0.000 244.0 150.0 1.02
xto 33.897 2.738 29.605 37.940 0.293 0.213 99.0 145.0 1.05
yto 4.003 1.003 2.531 5.309 0.062 0.044 384.0 265.0 1.01
sigma 4.121 0.851 3.145 5.557 0.038 0.028 598.0 304.0 1.01
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");


Truncate the trace based on a visual check (~1500 samples).

# truncate the trace
burn = 1500
trace_DEM = trace_DEM.sel(draw=slice(burn, None), groups="posterior")
trace = trace_DEM
az.summary(trace)

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.486 0.030 0.432 0.546 0.002 0.001 324.0 1095.0 1.01
beta 0.025 0.001 0.023 0.028 0.000 0.000 516.0 1256.0 1.01
gamma 0.922 0.063 0.794 1.034 0.003 0.002 371.0 1325.0 1.01
delta 0.027 0.002 0.024 0.031 0.000 0.000 407.0 1161.0 1.01
xto 34.754 1.286 32.276 37.084 0.065 0.046 392.0 771.0 1.02
yto 3.898 0.516 2.929 4.860 0.028 0.020 365.0 1046.0 1.01
sigma 4.049 0.430 3.314 4.878 0.018 0.013 593.0 1386.0 1.01
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler} - Burn-in Removed");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
After cleaning up the burn-in portion of the trace, the ESS is high, given the duration of sampling.

### Metropolis Sampler#

sampler = "Metropolis"
tune = draws = 2000
with model:
trace_M = pm.sample(step=[pm.Metropolis(vars_list)], tune=tune, draws=draws, cores=1)
trace = trace_M
az.summary(trace)

Sequential sampling (2 chains in 1 job)
CompoundStep
>Metropolis: [alpha]
>Metropolis: [beta]
>Metropolis: [gamma]
>Metropolis: [delta]
>Metropolis: [xto]
>Metropolis: [yto]
>Metropolis: [sigma]

100.00% [4000/4000 04:30<00:00 Sampling chain 0, 0 divergences]
C:\Users\greg\.conda\envs\pymc-dev\Lib\site-packages\scipy\integrate\_odepack_py.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning)

100.00% [4000/4000 04:47<00:00 Sampling chain 1, 0 divergences]
C:\Users\greg\.conda\envs\pymc-dev\Lib\site-packages\scipy\integrate\_odepack_py.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning)
Sampling 2 chains for 2_000 tune and 2_000 draw iterations (4_000 + 4_000 draws total) took 558 seconds.

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.481 0.017 0.455 0.521 0.006 0.004 12.0 22.0 1.24
beta 0.025 0.001 0.023 0.027 0.000 0.000 31.0 33.0 1.04
gamma 0.924 0.035 0.848 0.985 0.011 0.008 10.0 20.0 1.35
delta 0.027 0.001 0.025 0.029 0.000 0.000 12.0 20.0 1.26
xto 35.226 1.437 32.150 37.390 0.246 0.176 35.0 29.0 1.03
yto 3.898 0.483 3.016 4.886 0.104 0.074 22.0 39.0 1.06
sigma 4.225 0.542 3.413 5.298 0.032 0.023 330.0 440.0 1.01
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
The old-school Metropolis sampler is less reliable and slower than the DEMetroplis samplers. Not recommended.

### SMC Sampler#

The Sequential Monte Carlo (SMC) sampler is often presented as a likelihood-free method, however, it can also be used with a likelihood function. First, we will demonstrate its use with a likelihood function based on the model above, and then we will demonstrate its use with a distance function and compare the results.

#### SMC with a Likelihood Function#

sampler = "SMC with Likelihood"
draws = 500
with model:
trace_SMC_like = pm.sample_smc(draws=draws, progressbar=True, cores=1)
trace = trace_SMC_like
az.summary(trace)

Initializing SMC sampler...
Sampling 2 chains in 1 job

100.00% [200/200 00:00<? Chain: 2/2 Stage: 9 Beta: 1.000]


C:\Users\greg\.conda\envs\pymc-dev\Lib\site-packages\scipy\integrate\_odepack_py.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning)



C:\Users\greg\.conda\envs\pymc-dev\Lib\site-packages\scipy\integrate\_odepack_py.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning)
We recommend running at least 4 chains for robust computation of convergence diagnostics

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.457 0.061 0.345 0.568 0.024 0.018 7.0 241.0 1.22
beta 0.024 0.003 0.018 0.029 0.001 0.001 14.0 145.0 1.10
gamma 0.996 0.149 0.762 1.290 0.059 0.044 7.0 167.0 1.21
delta 0.029 0.004 0.022 0.038 0.002 0.001 8.0 83.0 1.19
xto 35.620 2.676 31.053 41.035 0.421 0.329 60.0 260.0 1.05
yto 3.695 0.957 1.959 5.570 0.238 0.171 15.0 98.0 1.09
sigma 6.303 1.892 3.701 8.789 1.288 1.074 3.0 104.0 1.83
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
At this number of samples and tuning scheme, the SMC algorithm results in wider uncertainty bounds compared with the other samplers.

#### SMC Using pm.Simulator Epsilon=1#

As outlined in the SMC tutorial on PyMC.io, the SMC sampler is often combined with a pm.Simulator function rather than a Pytensor op. Here is a rewrite of the PyMC - odeint model for the SMC sampler.

The simulator function needs to have the correct signature (e.g., accept an rng argument first).

# simulator function based on the signature rng, parameters, size.
def simulator_forward_model(rng, alpha, beta, gamma, delta, xt0, yt0, size=None):
theta = alpha, beta, gamma, delta, xt0, yt0
return odeint(func=rhs, y0=theta[-2:], t=data.year, args=(theta,))


Here is the model with the simulator function. This specification of SMC does not use a likelihood function but rather specifies a distance metric epsilon between the model and observed values.

with pm.Model() as model:
# Specify prior distributions for model parameters
alpha = pm.TruncatedNormal("alpha", mu=theta[0], sigma=theta[0], lower=0, initval=theta[0])
beta = pm.TruncatedNormal("beta", mu=theta[1], sigma=theta[1], lower=0, initval=theta[1])
gamma = pm.TruncatedNormal("gamma", mu=theta[2], sigma=theta[2], lower=0, initval=theta[2])
delta = pm.TruncatedNormal("delta", mu=theta[3], sigma=theta[3], lower=0, initval=theta[3])
xt0 = pm.TruncatedNormal("xto", mu=theta[4], sigma=theta[4], lower=0, initval=theta[4])
yt0 = pm.TruncatedNormal("yto", mu=theta[5], sigma=theta[5], lower=0, initval=theta[5])

# ode_solution
pm.Simulator(
"Y_obs",
simulator_forward_model,
params=(alpha, beta, gamma, delta, xt0, yt0),
epsilon=1,
observed=data[["hare", "lynx"]].values,
)


Inference. Note the progressbar was throwing an error so it is turned off.

sampler = "SMC_epsilon=1"
draws = 300
with model:
trace_SMC_e1 = pm.sample_smc(draws=draws, progressbar=False)
trace = trace_SMC_e1
az.summary(trace)

Initializing SMC sampler...
Sampling 4 chains in 4 jobs

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.481 0.009 0.463 0.497 0.000 0.000 1119.0 1064.0 1.00
beta 0.025 0.000 0.024 0.026 0.000 0.000 1174.0 1158.0 1.01
gamma 0.926 0.019 0.894 0.964 0.001 0.000 1163.0 1157.0 1.00
delta 0.028 0.001 0.027 0.029 0.000 0.000 1108.0 1202.0 1.00
xto 34.903 0.389 34.090 35.550 0.012 0.008 1115.0 1134.0 1.00
yto 3.865 0.156 3.568 4.143 0.005 0.003 1176.0 1238.0 1.00
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
This is interesting. The SMC sampler underestimates uncertainty compared to the other samplers. What is going on? Remember that the SMC sampler is using distance rather than a likelihood function. So when epsilon is small, the parameter estimates are essentially converging toward the least squares estimate. Let’s try again with a larger epsilon.

#### SMC with Epsilon = 10#

with pm.Model() as model:
# Priors
alpha = pm.TruncatedNormal("alpha", mu=theta[0], sigma=theta[0], lower=0, initval=theta[0])
beta = pm.TruncatedNormal("beta", mu=theta[1], sigma=theta[1], lower=0, initval=theta[1])
gamma = pm.TruncatedNormal("gamma", mu=theta[2], sigma=theta[2], lower=0, initval=theta[2])
delta = pm.TruncatedNormal("delta", mu=theta[3], sigma=theta[3], lower=0, initval=theta[3])
xt0 = pm.TruncatedNormal("xto", mu=theta[4], sigma=theta[4], lower=0, initval=theta[4])
yt0 = pm.TruncatedNormal("yto", mu=theta[5], sigma=theta[5], lower=0, initval=theta[5])

# ode_solution
pm.Simulator(
"Y_obs",
simulator_forward_model,
params=(alpha, beta, gamma, delta, xt0, yt0),
epsilon=10,
observed=data[["hare", "lynx"]].values,
)

sampler = "SMC epsilon=10"
draws = 300
with model:
trace_SMC_e10 = pm.sample_smc(draws=draws, progressbar=False)
trace = trace_SMC_e10
az.summary(trace)

Initializing SMC sampler...
Sampling 4 chains in 4 jobs

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.498 0.088 0.344 0.671 0.003 0.002 1028.0 1120.0 1.0
beta 0.026 0.004 0.018 0.034 0.000 0.000 1048.0 1174.0 1.0
gamma 0.926 0.176 0.613 1.253 0.005 0.004 1065.0 1086.0 1.0
delta 0.028 0.005 0.018 0.037 0.000 0.000 1005.0 1073.0 1.0
xto 34.654 3.830 28.050 41.971 0.126 0.091 924.0 1202.0 1.0
yto 4.122 1.366 1.789 6.947 0.042 0.030 1055.0 1109.0 1.0
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
Now, we see that the SMC sampler with a larger epsilon over-estimates parameter uncertainty compared to the other samplers. So which value for epsilon in “right”? In a sense, the likelihood function used in other model specification finds the right value for the model error (sigma), to balance the uncertainty in model parameters and noise in the data. This specification of SMC does not have that feature, since epsilon is specified by the user rather than discovered by the model.

### Posterior Correlations#

As an aside, it is worth pointing out that the posterior parameter space is a difficult geometry for sampling.

az.plot_pair(trace_DEM, figsize=(8, 6), scatter_kwargs=dict(alpha=0.01), marginals=True)
plt.suptitle("Pair Plot Showing Posterior Correlations", size=18);


The major observation here is that the posterior shape is pretty difficult for a sampler to handle, with positive correlations, negative correlations, crecent-shapes, and large variations in scale. This contributes to the slow sampling (in addition to the computational overhead in solving the ODE thousands of times). This is also fun to look at for understanding how the model parameters impact each other.

The PyMC default NUTs sampler can only be used if gradients are supplied to the sampler. In this section, we will solve the system of ODEs within PyMC in two different ways that supply the sampler with gradients. The first is the built-in pymc.ode.DifferentialEquation solver, and the second is to forward simulate using pytensor.scan, which allows looping. Note that there may be other better and faster ways to perform Bayesian inference with ODEs using gradients, such as the sunode project, and diffrax, which relies on JAX.

### PyMC ODE Module#

Pymc.ode uses scipy.odeint under the hood to estimate a solution and then estimate the gradient through finite differences.

The pymc.ode API is similar to scipy.odeint. The right-hand-side equations are put in a function and written as if y and p are vectors, as follows. (Even when your model has one state and/or one parameter, you should explicitly write y[0] and/or p[0].)

def rhs_pymcode(y, t, p):
dX_dt = p[0] * y[0] - p[1] * y[0] * y[1]
dY_dt = -p[2] * y[1] + p[3] * y[0] * y[1]
return [dX_dt, dY_dt]


DifferentialEquation takes as arguments:

• func: A function specifying the differential equation (i.e. $$f(\mathbf{y},t,\mathbf{p})$$),

• times: An array of times at which data was observed,

• n_states: The dimension of $$f(\mathbf{y},t,\mathbf{p})$$ (number of output parameters),

• n_theta: The dimension of $$\mathbf{p}$$ (number of input parameters),

• t0: Optional time to which the initial condition belongs,

as follows:

ode_model = DifferentialEquation(
func=rhs_pymcode, times=data.year.values, n_states=2, n_theta=4, t0=data.year.values[0]
)


Once the ODE is specified, we can use it in our PyMC model.

#### NUTs Inference#

pymc.ode is quite slow, so for demonstration purposes, we will only draw a few samples.

with pm.Model() as model:
# Priors
alpha = pm.TruncatedNormal("alpha", mu=theta[0], sigma=theta[0], lower=0, initval=theta[0])
beta = pm.TruncatedNormal("beta", mu=theta[1], sigma=theta[1], lower=0, initval=theta[1])
gamma = pm.TruncatedNormal("gamma", mu=theta[2], sigma=theta[2], lower=0, initval=theta[2])
delta = pm.TruncatedNormal("delta", mu=theta[3], sigma=theta[3], lower=0, initval=theta[3])
xt0 = pm.TruncatedNormal("xto", mu=theta[4], sigma=theta[4], lower=0, initval=theta[4])
yt0 = pm.TruncatedNormal("yto", mu=theta[5], sigma=theta[5], lower=0, initval=theta[5])
sigma = pm.TruncatedNormal("sigma", mu=10, sigma=10, lower=0)

# ode_solution
ode_solution = ode_model(y0=[xt0, yt0], theta=[alpha, beta, gamma, delta])

# Likelihood
pm.Normal("Y_obs", mu=ode_solution, sigma=sigma, observed=data[["hare", "lynx"]].values)

sampler = "NUTs PyMC ODE"
tune = draws = 15
with model:
trace_pymc_ode = pm.sample(tune=tune, draws=draws)

Only 15 samples in chain.
Auto-assigning NUTS sampler...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, gamma, delta, xto, yto, sigma]

100.00% [120/120 07:12<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 15 tune and 15 draw iterations (60 + 60 draws total) took 476 seconds.
C:\Users\greg\AppData\Local\Temp\ipykernel_5620\3816606753.py:4: UserWarning: The number of samples is too small to check convergence reliably.
trace_pymc_ode = pm.sample(tune=tune, draws=draws)

trace = trace_pymc_ode
az.summary(trace)

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.479 0.195 0.203 0.823 0.073 0.054 8.0 16.0 2.79
beta 0.035 0.012 0.023 0.053 0.004 0.003 8.0 19.0 2.31
gamma 1.795 1.708 0.558 4.740 0.645 0.477 8.0 19.0 2.87
delta 0.022 0.007 0.010 0.031 0.003 0.002 9.0 18.0 1.86
xto 24.557 13.732 2.427 36.930 5.166 3.816 9.0 40.0 2.15
yto 5.252 1.280 3.270 6.862 0.435 0.329 9.0 61.0 1.98
sigma 183.718 339.955 2.345 1142.468 123.738 91.119 8.0 16.0 2.62
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
The NUTs sampler is starting to find to the correct posterior, but would need a whole lot more time to make a good inference.

### Simulate with Pytensor Scan#

Finally, we can write the system of ODEs as a forward simulation solver within PyMC. The way to write for-loops in PyMC is with pytensor.scan. Gradients are then supplied to the sampler via autodifferentiation.

First, we should test that the time steps are sufficiently small to get a reasonable estimate.

#### Check Time Steps#

Create a function that accepts different numbers of time steps for testing. The function also demonstrates how pytensor.scan is used.

# Lotka-Volterra forward simulation model using scan
def lv_scan_simulation_model(theta, steps_year=100, years=21):

# variables to control time steps
n_steps = years * steps_year
dt = 1 / steps_year

# PyMC model
with pm.Model() as model:
# Priors (these are static for testing)
alpha = theta[0]
beta = theta[1]
gamma = theta[2]
delta = theta[3]
xt0 = theta[4]
yt0 = theta[5]

# Lotka-Volterra calculation function
## Similar to the right-hand-side functions used earlier
## but with dt applied to the equations
def ode_update_function(x, y, alpha, beta, gamma, delta):
x_new = x + (alpha * x - beta * x * y) * dt
y_new = y + (-gamma * y + delta * x * y) * dt
return x_new, y_new

# Pytensor scan looping function
## The function argument names are not intuitive in this context!
fn=ode_update_function,  # function
outputs_info=[xt0, yt0],  # initial conditions
non_sequences=[alpha, beta, gamma, delta],  # parameters
n_steps=n_steps,  # number of loops
)

# Put the results together and track the result
pm.Deterministic("result", pm.math.stack([result[0], result[1]], axis=1))

return model


Run the simulation for various time steps and plot the results.

fig, ax = plt.subplots(figsize=(7, 4))

steps_years = [12, 100, 1000, 10000]
for steps_year in steps_years:
time = np.arange(1900, 1921, 1 / steps_year)
model = lv_scan_simulation_model(theta, steps_year=steps_year)
with model:
prior = pm.sample_prior_predictive(1)
ax.plot(time, prior.prior.result[0][0].values, label=str(steps_year) + " steps/year")
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ax.set_title("Lotka-Volterra Forward Simulation Model with different step sizes");

Sampling: []
Sampling: []
Sampling: []
Sampling: []


Notice how the lower resolution simulations are less accurate over time. Based on this check, 100 time steps per year is sufficiently accurate. 12 steps per year has too much “numerical diffusion” over 20 years of simulation.

#### Inference Using NUTs#

Now that we are OK with 100 time steps per year, we write the model with indexing to align the data with the results.

def lv_scan_inference_model(theta, steps_year=100, years=21):

# variables to control time steps
n_steps = years * steps_year
dt = 1 / steps_year

# variables to control indexing to get annual values
segment = [True] + [False] * (steps_year - 1)
boolist_idxs = []
for _ in range(years):
boolist_idxs += segment

# PyMC model
with pm.Model() as model:
# Priors
alpha = pm.TruncatedNormal("alpha", mu=theta[0], sigma=theta[0], lower=0, initval=theta[0])
beta = pm.TruncatedNormal("beta", mu=theta[1], sigma=theta[1], lower=0, initval=theta[1])
gamma = pm.TruncatedNormal("gamma", mu=theta[2], sigma=theta[2], lower=0, initval=theta[2])
delta = pm.TruncatedNormal("delta", mu=theta[3], sigma=theta[3], lower=0, initval=theta[3])
xt0 = pm.TruncatedNormal("xto", mu=theta[4], sigma=theta[4], lower=0, initval=theta[4])
yt0 = pm.TruncatedNormal("yto", mu=theta[5], sigma=theta[5], lower=0, initval=theta[5])
sigma = pm.TruncatedNormal("sigma", mu=10, sigma=10, lower=0)

# Lotka-Volterra calculation function
def ode_update_function(x, y, alpha, beta, gamma, delta):
x_new = x + (alpha * x - beta * x * y) * dt
y_new = y + (-gamma * y + delta * x * y) * dt
return x_new, y_new

# Pytensor scan is a looping function
fn=ode_update_function,  # function
outputs_info=[xt0, yt0],  # initial conditions
non_sequences=[alpha, beta, gamma, delta],  # parameters
n_steps=n_steps,
)  # number of loops

# Put the results together
final_result = pm.math.stack([result[0], result[1]], axis=1)
# Filter the results down to annual values
annual_value = final_result[np.array(boolist_idxs), :]

# Likelihood function
pm.Normal("Y_obs", mu=annual_value, sigma=sigma, observed=data[["hare", "lynx"]].values)
return model


This is also quite slow, so we will just pull a few samples for demonstration purposes.

steps_year = 100
model = lv_scan_inference_model(theta, steps_year=steps_year)
sampler = "NUTs Pytensor Scan"
tune = draws = 50
with model:
trace_scan = pm.sample(tune=tune, draws=draws)

Only 50 samples in chain.
Auto-assigning NUTS sampler...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, gamma, delta, xto, yto, sigma]

100.00% [400/400 15:40<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 50 tune and 50 draw iterations (200 + 200 draws total) took 993 seconds.
C:\Users\greg\AppData\Local\Temp\ipykernel_5620\1292993890.py:6: UserWarning: The number of samples is too small to check convergence reliably.
trace_scan = pm.sample(tune=tune, draws=draws)

trace = trace_scan
az.summary(trace)

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.511 0.170 0.392 0.574 0.043 0.031 36.0 13.0 1.07
beta 0.028 0.017 0.020 0.030 0.004 0.003 48.0 13.0 1.05
gamma 0.920 0.134 0.788 1.144 0.030 0.021 33.0 13.0 1.08
delta 0.027 0.004 0.024 0.034 0.001 0.001 30.0 13.0 1.08
xto 34.175 3.533 30.303 38.891 0.796 0.571 70.0 16.0 1.04
yto 4.048 1.016 2.729 5.465 0.140 0.101 50.0 41.0 1.07
sigma 6.480 14.881 3.185 5.676 2.663 1.901 36.0 16.0 1.08
az.plot_trace(trace, figsize=(7, 7))
plt.suptitle(f"Trace Plot {sampler}");

time = np.arange(1900, 1921, 0.01)
odeint(func=rhs, y0=theta[-2:], t=time, args=(theta,)).shape

(2100, 2)

fig, ax = plt.subplots(figsize=(7, 4))
plot_inference(ax, trace, title=f"Data and Inference Model Runs\n{sampler} Sampler");


Notes:
The sampler is faster than the pymc.ode implementation, but still slower than scipy odeint combined with gradient-free inference methods.

## Summary#

Let’s compare inference results among these different methods. Recall that, in order to run this notebook in a reasonable amount of time, we have an insufficient number of samples for many inference methods. For a fair comparison, we would need to bump up the number of samples and run the notebook for longer. Regardless, let’s take a look.

# Make lists with variable for looping
var_names = [str(s).split("_")[0] for s in list(model.values_to_rvs.keys())[:-1]]
# Make lists with model results and model names for plotting
inference_results = [
trace_slice,
trace_DEMZ,
trace_DEM,
trace_M,
trace_SMC_like,
trace_SMC_e1,
trace_SMC_e10,
trace_pymc_ode,
trace_scan,
]
model_names = [
"Slice Sampler",
"DEMetropolisZ",
"DEMetropolis",
"Metropolis",
"SMC with Likelihood",
"SMC e=1",
"SMC e=10",
"PyMC ODE NUTs",
"Pytensor Scan NUTs",
]

# Loop through variable names
for var_name in var_names:
axes = az.plot_forest(
inference_results,
model_names=model_names,
var_names=var_name,
kind="forestplot",
legend=False,
combined=True,
figsize=(7, 3),
)
axes[0].set_title(f"Marginal Probability: {var_name}")
# Clean up ytick labels
ylabels = axes[0].get_yticklabels()
new_ylabels = []
for label in ylabels:
txt = label.get_text()
txt = txt.replace(": " + var_name, "")
label.set_text(txt)
new_ylabels.append(label)
axes[0].set_yticklabels(new_ylabels)

plt.show();


Notes:
If we ran the samplers for long enough to get good inferences, we would expect them to converge on the same posterior probability distributions, with the exception of SMC when using a distance parameter (SMC e=1 and SMC e=10). When using a distance parameter, the model does not include a likelihood function and therefore represents a different posterior probability distribution than the other models.

### Key Conclusions#

We performed Bayesian inference on a system of ODEs in 4 main ways:

• Scipy odeint wrapped in a Pytensor op and sampled with non-gradient-based samplers (comparing 5 different samplers).

• Scipy odeint wrapped in a pm.Simulator function and sampled with a non-likelihood-based sequential Monte Carlo (SMC) sampler.

• PyMC ode.DifferentialEquation sampled with NUTs.

• Forward simulation using pytensor.scan and sampled with NUTs.

The “winner” for this problem was the Scipy odeint solver with a differential evolution (DE) Metropolis sampler. The improved efficiency of the NUTs sampler did not make up for the inefficiency in using the slow ODE solvers with gradients. Sticking with Scipy and DEMetropolis is also the simplest workflow for a scientist with a working numeric model and the desire to perform Bayesian inference. Just wrapping the numeric model in a Pytensor op and plugging it into a PyMC model can get you a long way!

## Authors#

Organized and rewritten by Greg Brunkhorst from multiple legacy PyMC.io example notebooks by Sanmitra Ghosh, Demetri Pananos, and the PyMC Team (Approximate Bayesian Computation).

## References#

1

Richard McElreath. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2018.

## Watermark#

%watermark -n -u -v -iv -w

Last updated: Wed Jan 18 2023

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.7.0

numpy     : 1.24.0
pytensor  : 2.9.1
pymc      : 5.0.1+5.ga7f361bd
matplotlib: 3.6.2
pandas    : 1.5.2
arviz     : 0.14.0

Watermark: 2.3.1


All the notebooks in this example gallery are provided under the MIT License which allows modification, and redistribution for any use provided the copyright and license notices are preserved.

## Citing PyMC examples#

To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.

Important

Many notebooks are adapted from other sources: blogs, books… In such cases you should cite the original source as well.

Also remember to cite the relevant libraries used by your code.

Here is an citation template in bibtex:

@incollection{citekey,
author    = "<notebook authors, see above>",
title     = "<notebook title>",
editor    = "PyMC Team",
booktitle = "PyMC examples",
doi       = "10.5281/zenodo.5654871"
}


which once rendered could look like:

• Greg Brunkhorst . "ODE Lotka-Volterra With Bayesian Inference in Multiple Ways". In: PyMC Examples. Ed. by PyMC Team. DOI: 10.5281/zenodo.5654871