# About Conditional Autoregressive models in PyMC#

This notebook explains the design principles behind the Conditional Autoregressive (CAR) distribution as implemented in PyMC. For a simple tutorial of why and how to use the CAR distribution, see this notebook.

```
import arviz as az
import numpy as np
import pandas as pd
import pymc3 as pm
import scipy.stats as stats
import theano
import theano.tensor as tt
from pymc3.distributions import continuous, distribution
from theano import scan, shared
```

```
%config InlineBackend.figure_format = 'retina'
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
az.style.use("arviz-darkgrid")
floatX = "float32"
```

As a probabilistic language, there are some fundamental differences between `PyMC3`

and other alternatives such as `WinBUGS`

, `JAGS`

, and `Stan`

. In this notebook, I will summarise some heuristics and intuition I got over the past two years using `PyMC3`

. I will outline some thinking in how I approach a modelling problem using `PyMC3`

, and how thinking in linear algebra solves most of the programming problems. I hope this notebook will shed some light onto the design and features of `PyMC3`

, and similar languages that are built on linear algebra packages with a static world view (e.g., Edward, which is based on Tensorflow).

For more resources comparing between PyMC3 codes and other probabilistic languages:

PyMC3 port of “Doing Bayesian Data Analysis” - PyMC3 vs WinBUGS/JAGS/Stan

PyMC3 port of “Bayesian Cognitive Modeling” - PyMC3 vs WinBUGS/JAGS/Stan

## Background information#

Suppose we want to implement a Conditional Autoregressive (CAR) model with examples in WinBUGS/PyMC2 and Stan.

For the sake of brevity, I will not go into the details of the CAR model. The essential idea is autocorrelation, which is informally “correlation with itself”. In a CAR model, the probability of values estimated at any given location \(y_i\) are conditional on some neighboring values \(y_j, _{j \neq i}\) (in another word, correlated/covariated with these values):

where \(\sigma_i^{2}\) is a spatially varying covariance parameter, and \(b_{ii} = 0\).

Here we will demonstrate the implementation of a CAR model using a canonical example: the lip cancer risk data in Scotland between 1975 and 1980. The original data is from (Kemp et al. 1985). This dataset includes observed lip cancer case counts at 56 spatial units in Scotland, with the expected number of cases as intercept, and an area-specific continuous variable coded for the proportion of the population employed in agriculture, fishing, or forestry (AFF). We want to model how lip cancer rates (`O`

below) relate to AFF (`aff`

below), as exposure to sunlight is a risk factor.

Setting up the data:

```
# Read the data from file containing columns: NAME, CANCER, CEXP, AFF, ADJ, WEIGHTS
df_scot_cancer = pd.read_csv(pm.get_data("scotland_lips_cancer.csv"))
# name of the counties
county = df_scot_cancer["NAME"].values
# observed
O = df_scot_cancer["CANCER"].values
N = len(O)
# expected (E) rates, based on the age of the local population
E = df_scot_cancer["CEXP"].values
logE = np.log(E)
# proportion of the population engaged in agriculture, forestry, or fishing (AFF)
aff = df_scot_cancer["AFF"].values / 10.0
# Spatial adjacency information: column (ADJ) contains list entries which are preprocessed to obtain adj as list of lists
adj = (
df_scot_cancer["ADJ"].apply(lambda x: [int(val) for val in x.strip("][").split(",")]).to_list()
)
# Change to Python indexing (i.e. -1)
for i in range(len(adj)):
for j in range(len(adj[i])):
adj[i][j] = adj[i][j] - 1
# spatial weight: column (WEIGHTS) contains list entries which are preprocessed to obtain weights as list of lists
weights = (
df_scot_cancer["WEIGHTS"]
.apply(lambda x: [int(val) for val in x.strip("][").split(",")])
.to_list()
)
Wplus = np.asarray([sum(w) for w in weights])
```

## A WinBUGS/PyMC2 implementation#

The classical `WinBUGS`

implementation (more information here):

```
model
{
for (i in 1 : regions) {
O[i] ~ dpois(mu[i])
log(mu[i]) <- log(E[i]) + beta0 + beta1*aff[i]/10 + phi[i] + theta[i]
theta[i] ~ dnorm(0.0,tau.h)
}
phi[1:regions] ~ car.normal(adj[], weights[], Wplus[], tau.c)
beta0 ~ dnorm(0.0, 1.0E-5) # vague prior on grand intercept
beta1 ~ dnorm(0.0, 1.0E-5) # vague prior on covariate effect
tau.h ~ dgamma(3.2761, 1.81)
tau.c ~ dgamma(1.0, 1.0)
sd.h <- sd(theta[]) # marginal SD of heterogeneity effects
sd.c <- sd(phi[]) # marginal SD of clustering (spatial) effects
alpha <- sd.c / (sd.h + sd.c)
}
```

The main challenge to porting this model to `PyMC3`

is the `car.normal`

function in `WinBUGS`

. It is a likelihood function that conditions each realization on some neighbour realization (a smoothed property). In `PyMC2`

, it could be implemented as a custom likelihood function (a `@stochastic`

node) `mu_phi`

:

We can just define `mu_phi`

similarly and wrap it in a `pymc3.DensityDist`

, however, doing so usually results in a very slow model (both in compiling and sampling). In general, porting pymc2 code into pymc3 (or even generally porting `WinBUGS`

, `JAGS`

, or `Stan`

code into `PyMC3`

) that use a `for`

loops tend to perform poorly in `theano`

, the backend of `PyMC3`

.

The underlying mechanism in `PyMC3`

is very different compared to `PyMC2`

, using `for`

loops to generate RV or stacking multiple RV with arguments such as `[pm.Binomial('obs%'%i, p[i], n) for i in range(K)]`

generate unnecessary large number of nodes in `theano`

graph, which then slows down compilation appreciably.

The easiest way is to move the loop out of `pm.Model`

. And usually is not difficult to do. For example, in `Stan`

you can have a `transformed data{}`

block; in `PyMC3`

you just need to compute it before defining your Model.

If it is absolutely necessary to use a `for`

loop, you can use a theano loop (i.e., `theano.scan`

), which you can find some introduction on the theano website and see a usecase in PyMC3 timeseries distribution.

## PyMC3 implementation using `theano.scan`

#

So lets try to implement the CAR model using `theano.scan`

. First we create a `theano`

function with `theano.scan`

and check if it really works by comparing its result to the for-loop.

```
value = np.asarray(
np.random.randn(
N,
),
dtype=theano.config.floatX,
)
maxwz = max([sum(w) for w in weights])
N = len(weights)
wmat = np.zeros((N, maxwz))
amat = np.zeros((N, maxwz), dtype="int32")
for i, w in enumerate(weights):
wmat[i, np.arange(len(w))] = w
amat[i, np.arange(len(w))] = adj[i]
# defining the tensor variables
x = tt.vector("x")
x.tag.test_value = value
w = tt.matrix("w")
# provide Theano with a default test-value
w.tag.test_value = wmat
a = tt.matrix("a", dtype="int32")
a.tag.test_value = amat
def get_mu(w, a):
a1 = tt.cast(a, "int32")
return tt.sum(w * x[a1]) / tt.sum(w)
results, _ = theano.scan(fn=get_mu, sequences=[w, a])
compute_elementwise = theano.function(inputs=[x, w, a], outputs=results)
print(compute_elementwise(value, wmat, amat))
def mu_phi(value):
N = len(weights)
# Calculate mu based on average of neighbours
mu = np.array([np.sum(weights[i] * value[adj[i]]) / Wplus[i] for i in range(N)])
return mu
print(mu_phi(value))
```

```
[-0.03256903 1.15369772 -0.88367923 0.19739633 0.2086316 -0.60637122
0.0821804 -1.37764807 -0.29132894 0.05172231 -0.14216446 0.08377788
0.39272752 -0.44500841 0.09997776 0.26546058 0.66625458 0.02811006
0.28666674 0.78806007 -0.13624863 0.03447581 0.07280037 -0.57371348
-0.14919659 -0.03653583 -0.39487311 1.04195921 0.14203152 -0.01594515
-0.01352197 0.00947096 0.37859961 -0.38082245 0.55939024 -0.10175728
-0.02190934 0.34201749 -0.41573475 -0.07256629 0.39001316 0.06973286
0.0839806 0.3099865 -0.27618815 -0.55965603 0.39102644 -0.4616728
-0.78256164 0.11348734 0.29724215 0.37958576 0.22718645 0.09669855
0.0420775 0.26230918]
[-0.03256903 1.15369772 -0.88367923 0.19739633 0.2086316 -0.60637122
0.0821804 -1.37764807 -0.29132894 0.05172231 -0.14216446 0.08377788
0.39272752 -0.44500841 0.09997776 0.26546058 0.66625458 0.02811006
0.28666674 0.78806007 -0.13624863 0.03447581 0.07280037 -0.57371348
-0.14919659 -0.03653583 -0.39487311 1.04195921 0.14203152 -0.01594515
-0.01352197 0.00947096 0.37859961 -0.38082245 0.55939024 -0.10175728
-0.02190934 0.34201749 -0.41573475 -0.07256629 0.39001316 0.06973286
0.0839806 0.3099865 -0.27618815 -0.55965603 0.39102644 -0.4616728
-0.78256164 0.11348734 0.29724215 0.37958576 0.22718645 0.09669855
0.0420775 0.26230918]
```

Since it produces the same result as the original for-loop, we will wrap it as a new distribution with a log-likelihood function in `PyMC3`

.

```
class CAR(distribution.Continuous):
"""
Conditional Autoregressive (CAR) distribution
Parameters
----------
a : list of adjacency information
w : list of weight information
tau : precision at each location
"""
def __init__(self, w, a, tau, *args, **kwargs):
super().__init__(*args, **kwargs)
self.a = a = tt.as_tensor_variable(a)
self.w = w = tt.as_tensor_variable(w)
self.tau = tau * tt.sum(w, axis=1)
self.mode = 0.0
def get_mu(self, x):
def weight_mu(w, a):
a1 = tt.cast(a, "int32")
return tt.sum(w * x[a1]) / tt.sum(w)
mu_w, _ = scan(fn=weight_mu, sequences=[self.w, self.a])
return mu_w
def logp(self, x):
mu_w = self.get_mu(x)
tau = self.tau
return tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x))
```

We then use it in our `PyMC3`

version of the CAR model:

```
with pm.Model() as model1:
# Vague prior on intercept
beta0 = pm.Normal("beta0", mu=0.0, tau=1.0e-5)
# Vague prior on covariate effect
beta1 = pm.Normal("beta1", mu=0.0, tau=1.0e-5)
# Random effects (hierarchial) prior
tau_h = pm.Gamma("tau_h", alpha=3.2761, beta=1.81)
# Spatial clustering prior
tau_c = pm.Gamma("tau_c", alpha=1.0, beta=1.0)
# Regional random effects
theta = pm.Normal("theta", mu=0.0, tau=tau_h, shape=N)
mu_phi = CAR("mu_phi", w=wmat, a=amat, tau=tau_c, shape=N)
# Zero-centre phi
phi = pm.Deterministic("phi", mu_phi - tt.mean(mu_phi))
# Mean model
mu = pm.Deterministic("mu", tt.exp(logE + beta0 + beta1 * aff + theta + phi))
# Likelihood
Yi = pm.Poisson("Yi", mu=mu, observed=O)
# Marginal SD of heterogeniety effects
sd_h = pm.Deterministic("sd_h", tt.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = pm.Deterministic("sd_c", tt.std(phi))
# Proportion sptial variance
alpha = pm.Deterministic("alpha", sd_c / (sd_h + sd_c))
infdata1 = pm.sample(
1000,
tune=500,
cores=4,
init="advi",
target_accept=0.9,
max_treedepth=15,
return_inferencedata=True,
)
```

```
Auto-assigning NUTS sampler...
Initializing NUTS using advi...
```

```
Convergence achieved at 16500
Interrupted at 16,499 [8%]: Average Loss = 345.94
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu_phi, theta, tau_c, tau_h, beta1, beta0]
```

```
Sampling 4 chains for 500 tune and 1_000 draw iterations (2_000 + 4_000 draws total) took 301 seconds.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
```

Note: there are some hidden problems with the model, some regions of the parameter space are quite difficult to sample from. Here I am using ADVI as initialization, which gives a smaller variance of the mass matrix. It keeps the sampler around the mode.

```
az.plot_trace(infdata1, var_names=["alpha", "sd_h", "sd_c"]);
```

We also got a lot of Rhat warning, that’s because the Zero-centre phi introduce unidentification to the model:

```
summary1 = az.summary(infdata1)
summary1[summary1["r_hat"] > 1.05]
```

mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_mean | ess_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|---|

mu_phi[0] | 175.365 | 368.522 | -217.156 | 799.593 | 178.151 | 137.750 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[1] | 175.264 | 368.514 | -217.062 | 799.332 | 178.147 | 137.744 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[2] | 175.289 | 368.512 | -217.037 | 799.785 | 178.148 | 137.745 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[3] | 174.516 | 368.515 | -217.975 | 798.649 | 178.148 | 137.719 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[4] | 175.343 | 368.515 | -217.005 | 799.412 | 178.148 | 137.746 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[5] | 175.188 | 368.503 | -217.167 | 799.809 | 178.143 | 137.738 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[6] | 175.139 | 368.513 | -217.232 | 799.189 | 178.147 | 137.739 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[7] | 175.207 | 368.504 | -217.303 | 799.763 | 178.145 | 137.741 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[8] | 175.007 | 368.516 | -217.048 | 799.499 | 178.149 | 137.736 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[9] | 175.007 | 368.508 | -217.321 | 799.023 | 178.145 | 137.733 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[10] | 175.326 | 368.514 | -217.090 | 799.384 | 178.148 | 137.746 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[11] | 175.310 | 368.511 | -217.104 | 799.357 | 178.147 | 137.744 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[12] | 175.198 | 368.518 | -217.064 | 799.598 | 178.149 | 137.742 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[13] | 174.180 | 368.509 | -218.106 | 798.798 | 178.146 | 137.705 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[14] | 174.533 | 368.503 | -217.864 | 798.439 | 178.143 | 137.716 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[15] | 174.772 | 368.511 | -217.671 | 798.663 | 178.146 | 137.726 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[16] | 174.974 | 368.513 | -217.195 | 799.203 | 178.147 | 137.733 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[17] | 174.299 | 368.513 | -218.131 | 798.458 | 178.147 | 137.711 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[18] | 175.184 | 368.521 | -217.285 | 799.266 | 178.151 | 137.743 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[19] | 174.391 | 368.511 | -218.213 | 798.576 | 178.147 | 137.714 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[20] | 174.487 | 368.509 | -218.084 | 798.590 | 178.146 | 137.716 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[21] | 174.706 | 368.510 | -217.803 | 798.459 | 178.145 | 137.723 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[22] | 174.255 | 368.506 | -218.227 | 798.131 | 178.145 | 137.708 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[23] | 173.866 | 368.511 | -218.869 | 797.673 | 178.147 | 137.695 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[24] | 174.513 | 368.508 | -217.918 | 798.342 | 178.145 | 137.716 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[25] | 174.244 | 368.505 | -218.070 | 798.446 | 178.143 | 137.706 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[26] | 174.052 | 368.511 | -218.570 | 798.030 | 178.146 | 137.701 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[27] | 174.277 | 368.517 | -218.181 | 798.366 | 178.149 | 137.711 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[28] | 174.437 | 368.507 | -217.987 | 798.423 | 178.145 | 137.714 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[29] | 173.827 | 368.507 | -218.562 | 797.972 | 178.144 | 137.691 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[30] | 173.997 | 368.510 | -218.535 | 797.960 | 178.146 | 137.699 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[31] | 174.095 | 368.508 | -218.345 | 798.311 | 178.145 | 137.702 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[32] | 174.113 | 368.515 | -218.415 | 798.264 | 178.148 | 137.705 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[33] | 173.855 | 368.507 | -218.763 | 797.645 | 178.144 | 137.693 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[34] | 174.056 | 368.506 | -218.382 | 798.185 | 178.145 | 137.701 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[35] | 174.049 | 368.506 | -218.506 | 798.046 | 178.145 | 137.701 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[36] | 174.002 | 368.500 | -218.261 | 798.144 | 178.142 | 137.697 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[37] | 173.679 | 368.508 | -218.934 | 797.950 | 178.146 | 137.688 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[38] | 173.925 | 368.506 | -218.584 | 797.757 | 178.144 | 137.696 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[39] | 173.744 | 368.511 | -218.538 | 798.085 | 178.147 | 137.691 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[40] | 173.782 | 368.508 | -218.434 | 797.948 | 178.146 | 137.692 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[41] | 173.789 | 368.502 | -218.704 | 797.772 | 178.142 | 137.688 | 4.0 | 4.0 | 5.0 | 11.0 | 2.42 |

mu_phi[42] | 173.979 | 368.506 | -218.531 | 798.007 | 178.143 | 137.696 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[43] | 173.670 | 368.505 | -218.977 | 797.518 | 178.144 | 137.686 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[44] | 173.936 | 368.505 | -218.600 | 797.965 | 178.143 | 137.694 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[45] | 173.785 | 368.508 | -218.366 | 798.158 | 178.146 | 137.692 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[46] | 173.668 | 368.508 | -218.812 | 797.661 | 178.147 | 137.688 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[47] | 173.610 | 368.513 | -219.018 | 797.681 | 178.148 | 137.687 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[48] | 173.568 | 368.507 | -219.224 | 797.455 | 178.146 | 137.684 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[49] | 174.281 | 368.503 | -218.298 | 798.272 | 178.142 | 137.706 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[50] | 173.636 | 368.503 | -219.057 | 797.613 | 178.143 | 137.683 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[51] | 173.595 | 368.507 | -218.613 | 797.867 | 178.145 | 137.684 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[52] | 173.574 | 368.506 | -218.595 | 798.050 | 178.145 | 137.683 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[53] | 173.548 | 368.510 | -218.972 | 797.742 | 178.147 | 137.684 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[54] | 174.022 | 368.511 | -218.659 | 798.018 | 178.146 | 137.700 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

mu_phi[55] | 173.948 | 368.511 | -218.511 | 797.932 | 178.146 | 137.697 | 4.0 | 4.0 | 5.0 | 11.0 | 2.41 |

```
az.plot_forest(
infdata1,
kind="ridgeplot",
var_names=["phi"],
combined=False,
ridgeplot_overlap=3,
ridgeplot_alpha=0.25,
colors="white",
figsize=(9, 7),
);
```

```
az.plot_posterior(infdata1, var_names=["alpha"]);
```

`theano.scan`

is much faster than using a python for loop, but it is still quite slow. One approach for improving it is to use linear algebra. That is, we should try to find a way to use matrix multiplication instead of looping (if you have experience in using MATLAB, it is the same philosophy). In our case, we can totally do that.

For a similar problem, you can also have a look of my port of Lee and Wagenmakers’ book. For example, in Chapter 19, the Stan code use a for loop to generate the likelihood function, and I generate the matrix outside and use matrix multiplication etc to archive the same purpose.

## PyMC3 implementation using matrix “trick”#

Again, we try on some simulated data to make sure the implementation is correct.

```
maxwz = max([sum(w) for w in weights])
N = len(weights)
wmat2 = np.zeros((N, N))
amat2 = np.zeros((N, N), dtype="int32")
for i, a in enumerate(adj):
amat2[i, a] = 1
wmat2[i, a] = weights[i]
value = np.asarray(
np.random.randn(
N,
),
dtype=theano.config.floatX,
)
print(np.sum(value * amat2, axis=1) / np.sum(wmat2, axis=1))
def mu_phi(value):
N = len(weights)
# Calculate mu based on average of neighbours
mu = np.array([np.sum(weights[i] * value[adj[i]]) / Wplus[i] for i in range(N)])
return mu
print(mu_phi(value))
```

```
[ 2.25588227e-01 -8.20210477e-01 7.14165670e-01 3.07251143e-01
2.92335128e-01 -5.55562972e-01 -7.35797893e-01 5.94110931e-01
-2.72989587e-01 -6.21479051e-01 2.82642230e-01 -7.27562577e-01
9.72894293e-03 2.15441546e-01 3.16608110e-01 -5.38879039e-01
1.94526179e-01 1.19573877e-01 -2.02771767e-01 3.22618936e-01
-1.65824491e-01 -1.21021601e+00 3.18736883e-01 -4.98013820e-01
6.09599805e-02 2.40591863e-01 -6.29610020e-04 4.11071335e-02
-2.92231409e-01 -4.30276690e-01 2.06124487e-01 -6.86948082e-02
-1.26157102e-01 -9.18024137e-02 6.81225357e-01 -3.23761738e-01
5.23813476e-01 -1.10626670e-01 -6.48965436e-01 -9.01164776e-01
2.04888768e-01 -2.66684442e-01 -2.17412828e-01 -1.33620266e-01
-1.46843946e-01 -2.11092061e-01 1.34030315e-01 -3.39439739e-01
-7.96140962e-01 2.34936057e-01 -1.86926585e-01 -4.09354076e-01
-1.03251174e-01 -5.12496749e-01 -4.13624363e-01 1.80577442e-02]
[ 2.25588227e-01 -8.20210477e-01 7.14165670e-01 3.07251143e-01
2.92335128e-01 -5.55562972e-01 -7.35797893e-01 5.94110931e-01
-2.72989587e-01 -6.21479051e-01 2.82642230e-01 -7.27562577e-01
9.72894293e-03 2.15441546e-01 3.16608110e-01 -5.38879039e-01
1.94526179e-01 1.19573877e-01 -2.02771767e-01 3.22618936e-01
-1.65824491e-01 -1.21021601e+00 3.18736883e-01 -4.98013820e-01
6.09599805e-02 2.40591863e-01 -6.29610020e-04 4.11071335e-02
-2.92231409e-01 -4.30276690e-01 2.06124487e-01 -6.86948082e-02
-1.26157102e-01 -9.18024137e-02 6.81225357e-01 -3.23761738e-01
5.23813476e-01 -1.10626670e-01 -6.48965436e-01 -9.01164776e-01
2.04888768e-01 -2.66684442e-01 -2.17412828e-01 -1.33620266e-01
-1.46843946e-01 -2.11092061e-01 1.34030315e-01 -3.39439739e-01
-7.96140962e-01 2.34936057e-01 -1.86926585e-01 -4.09354076e-01
-1.03251174e-01 -5.12496749e-01 -4.13624363e-01 1.80577442e-02]
```

Now create a new CAR distribution with the matrix multiplication instead of `theano.scan`

to get the `mu`

```
class CAR2(distribution.Continuous):
"""
Conditional Autoregressive (CAR) distribution
Parameters
----------
a : adjacency matrix
w : weight matrix
tau : precision at each location
"""
def __init__(self, w, a, tau, *args, **kwargs):
super().__init__(*args, **kwargs)
self.a = a = tt.as_tensor_variable(a)
self.w = w = tt.as_tensor_variable(w)
self.tau = tau * tt.sum(w, axis=1)
self.mode = 0.0
def logp(self, x):
tau = self.tau
w = self.w
a = self.a
mu_w = tt.sum(x * a, axis=1) / tt.sum(w, axis=1)
return tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x))
```

```
with pm.Model() as model2:
# Vague prior on intercept
beta0 = pm.Normal("beta0", mu=0.0, tau=1.0e-5)
# Vague prior on covariate effect
beta1 = pm.Normal("beta1", mu=0.0, tau=1.0e-5)
# Random effects (hierarchial) prior
tau_h = pm.Gamma("tau_h", alpha=3.2761, beta=1.81)
# Spatial clustering prior
tau_c = pm.Gamma("tau_c", alpha=1.0, beta=1.0)
# Regional random effects
theta = pm.Normal("theta", mu=0.0, tau=tau_h, shape=N)
mu_phi = CAR2("mu_phi", w=wmat2, a=amat2, tau=tau_c, shape=N)
# Zero-centre phi
phi = pm.Deterministic("phi", mu_phi - tt.mean(mu_phi))
# Mean model
mu = pm.Deterministic("mu", tt.exp(logE + beta0 + beta1 * aff + theta + phi))
# Likelihood
Yi = pm.Poisson("Yi", mu=mu, observed=O)
# Marginal SD of heterogeniety effects
sd_h = pm.Deterministic("sd_h", tt.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = pm.Deterministic("sd_c", tt.std(phi))
# Proportion sptial variance
alpha = pm.Deterministic("alpha", sd_c / (sd_h + sd_c))
infdata2 = pm.sample(
1000,
tune=500,
cores=4,
init="advi",
target_accept=0.9,
max_treedepth=15,
return_inferencedata=True,
)
```

```
Auto-assigning NUTS sampler...
Initializing NUTS using advi...
```

```
Convergence achieved at 16900
Interrupted at 16,899 [8%]: Average Loss = 338.61
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu_phi, theta, tau_c, tau_h, beta1, beta0]
```

```
Sampling 4 chains for 500 tune and 1_000 draw iterations (2_000 + 4_000 draws total) took 57 seconds.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
```

**As you can see, it is appreciably faster using matrix multiplication.**

```
summary2 = az.summary(infdata2)
summary2[summary2["r_hat"] > 1.05]
```

mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_mean | ess_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|---|

mu_phi[0] | 29.076 | 245.989 | -425.818 | 388.013 | 111.959 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[1] | 29.000 | 245.980 | -425.952 | 387.926 | 111.954 | 84.510 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[2] | 28.994 | 246.000 | -425.650 | 388.224 | 111.966 | 84.519 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[3] | 28.252 | 245.971 | -426.657 | 387.110 | 111.949 | 84.506 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[4] | 29.056 | 245.992 | -425.765 | 387.904 | 111.960 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[5] | 28.908 | 245.997 | -425.880 | 388.138 | 111.966 | 84.519 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[6] | 28.859 | 245.983 | -425.961 | 387.830 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[7] | 28.940 | 245.993 | -425.777 | 387.958 | 111.964 | 84.517 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[8] | 28.720 | 245.987 | -426.006 | 387.682 | 111.958 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[9] | 28.735 | 245.982 | -426.243 | 387.557 | 111.956 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[10] | 29.044 | 245.990 | -425.548 | 388.004 | 111.960 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[11] | 29.028 | 245.995 | -425.704 | 388.058 | 111.963 | 84.516 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[12] | 28.917 | 245.987 | -425.954 | 387.818 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[13] | 27.902 | 245.984 | -426.922 | 386.918 | 111.956 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[14] | 28.230 | 245.988 | -426.431 | 387.230 | 111.960 | 84.515 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[15] | 28.487 | 245.988 | -426.110 | 387.548 | 111.960 | 84.515 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[16] | 28.688 | 245.981 | -426.016 | 387.680 | 111.955 | 84.510 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[17] | 28.020 | 245.977 | -426.864 | 386.932 | 111.953 | 84.509 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[18] | 28.892 | 245.986 | -425.857 | 387.955 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[19] | 28.115 | 245.985 | -426.847 | 386.944 | 111.955 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[20] | 28.191 | 245.989 | -426.539 | 387.026 | 111.963 | 84.517 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[21] | 28.426 | 245.984 | -426.382 | 387.549 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[22] | 27.969 | 245.993 | -426.554 | 387.104 | 111.961 | 84.515 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[23] | 27.590 | 245.987 | -427.080 | 386.605 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[24] | 28.220 | 245.986 | -426.470 | 387.350 | 111.961 | 84.515 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[25] | 27.973 | 245.988 | -426.703 | 387.040 | 111.960 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[26] | 27.774 | 245.988 | -427.010 | 386.806 | 111.955 | 84.510 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[27] | 28.003 | 245.978 | -426.589 | 387.169 | 111.952 | 84.508 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[28] | 28.155 | 245.985 | -426.398 | 387.266 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[29] | 27.551 | 245.989 | -426.999 | 386.797 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[30] | 27.717 | 245.985 | -427.012 | 386.761 | 111.955 | 84.510 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[31] | 27.819 | 245.986 | -426.992 | 386.784 | 111.956 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[32] | 27.835 | 245.984 | -427.027 | 386.634 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[33] | 27.570 | 245.986 | -427.049 | 386.650 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[34] | 27.776 | 245.983 | -427.043 | 386.654 | 111.955 | 84.510 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[35] | 27.759 | 245.994 | -426.826 | 386.938 | 111.960 | 84.515 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[36] | 27.717 | 245.990 | -426.968 | 386.638 | 111.960 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[37] | 27.395 | 245.990 | -427.283 | 386.474 | 111.959 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[38] | 27.640 | 245.994 | -427.011 | 386.533 | 111.961 | 84.515 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[39] | 27.450 | 245.985 | -427.123 | 386.452 | 111.956 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[40] | 27.489 | 245.992 | -427.026 | 386.447 | 111.959 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[41] | 27.510 | 245.989 | -427.096 | 386.690 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[42] | 27.707 | 245.993 | -427.099 | 386.600 | 111.962 | 84.516 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[43] | 27.388 | 245.986 | -427.276 | 386.562 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[44] | 27.655 | 245.985 | -426.991 | 386.699 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[45] | 27.500 | 245.987 | -427.099 | 386.421 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[46] | 27.378 | 245.983 | -427.165 | 386.493 | 111.956 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[47] | 27.323 | 245.983 | -427.393 | 386.392 | 111.955 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[48] | 27.281 | 245.989 | -427.254 | 386.360 | 111.958 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[49] | 27.981 | 245.992 | -426.561 | 386.968 | 111.964 | 84.518 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[50] | 27.347 | 245.990 | -427.411 | 386.545 | 111.959 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[51] | 27.307 | 245.984 | -427.552 | 386.183 | 111.957 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[52] | 27.282 | 245.991 | -427.218 | 386.431 | 111.960 | 84.514 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[53] | 27.261 | 245.989 | -427.296 | 386.452 | 111.958 | 84.513 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[54] | 27.743 | 245.989 | -427.031 | 386.763 | 111.958 | 84.512 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

mu_phi[55] | 27.669 | 245.984 | -426.962 | 386.760 | 111.956 | 84.511 | 5.0 | 5.0 | 6.0 | 14.0 | 1.86 |

```
az.plot_forest(
infdata2,
kind="ridgeplot",
var_names=["phi"],
combined=False,
ridgeplot_overlap=3,
ridgeplot_alpha=0.25,
colors="white",
figsize=(9, 7),
);
```

```
az.plot_trace(infdata2, var_names=["alpha", "sd_h", "sd_c"]);
```

```
az.plot_posterior(infdata2, var_names=["alpha"]);
```

## PyMC3 implementation using Matrix multiplication#

There are almost always multiple ways to formulate a particular model. Some approaches work better than the others under different contexts (size of your dataset, properties of the sampler, etc).

In this case, we can express the CAR prior as:

You can find more details in the original Stan case study. You might come across similar constructs in Gaussian Process, which result in a zero-mean Gaussian distribution conditioned on a covariance function.

In the `Stan`

Code, matrix D is generated in the model using a `transformed data{}`

block:

```
transformed data{
vector[n] zeros;
matrix<lower = 0>[n, n] D;
{
vector[n] W_rowsums;
for (i in 1:n) {
W_rowsums[i] = sum(W[i, ]);
}
D = diag_matrix(W_rowsums);
}
zeros = rep_vector(0, n);
}
```

We can generate the same matrix quite easily:

```
X = np.hstack((np.ones((N, 1)), stats.zscore(aff, ddof=1)[:, None]))
W = wmat2
D = np.diag(W.sum(axis=1))
log_offset = logE[:, None]
```

Then in the `Stan`

model:

```
model {
phi ~ multi_normal_prec(zeros, tau * (D - alpha * W));
...
}
```

since the precision matrix just generated by some matrix multiplication, we can do just that in `PyMC3`

:

```
with pm.Model() as model3:
# Vague prior on intercept and effect
beta = pm.Normal("beta", mu=0.0, tau=1.0, shape=(2, 1))
# Priors for spatial random effects
tau = pm.Gamma("tau", alpha=2.0, beta=2.0)
alpha = pm.Uniform("alpha", lower=0, upper=1)
phi = pm.MvNormal("phi", mu=0, tau=tau * (D - alpha * W), shape=(1, N))
# Mean model
mu = pm.Deterministic("mu", tt.exp(tt.dot(X, beta) + phi.T + log_offset))
# Likelihood
Yi = pm.Poisson("Yi", mu=mu.ravel(), observed=O)
infdata3 = pm.sample(1000, tune=2000, cores=4, target_accept=0.85, return_inferencedata=True)
```

```
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [phi, alpha, tau, beta]
```

```
Sampling 4 chains for 2_000 tune and 1_000 draw iterations (8_000 + 4_000 draws total) took 131 seconds.
There were 21 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.9207219941260574, but should be close to 0.85. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.
```

```
az.plot_trace(infdata3, var_names=["alpha", "beta", "tau"]);
```

```
az.plot_posterior(infdata3, var_names=["alpha"]);
```

Notice that since the model parameterization is different than in the `WinBUGS`

model, the `alpha`

can’t be interpreted in the same way.

## PyMC3 implementation using Sparse Matrix#

Note that in the node \(\phi \sim \mathcal{N}(0, [D_\tau (I - \alpha B)]^{-1})\), we are computing the log-likelihood for a multivariate Gaussian distribution, which might not scale well in high-dimensions. We can take advantage of the fact that the covariance matrix here \([D_\tau (I - \alpha B)]^{-1}\) is **sparse**, and there are faster ways to compute its log-likelihood.

For example, a more efficient sparse representation of the CAR in `Stan`

:

```
functions {
/**
* Return the log probability of a proper conditional autoregressive (CAR) prior
* with a sparse representation for the adjacency matrix
*
* @param phi Vector containing the parameters with a CAR prior
* @param tau Precision parameter for the CAR prior (real)
* @param alpha Dependence (usually spatial) parameter for the CAR prior (real)
* @param W_sparse Sparse representation of adjacency matrix (int array)
* @param n Length of phi (int)
* @param W_n Number of adjacent pairs (int)
* @param D_sparse Number of neighbors for each location (vector)
* @param lambda Eigenvalues of D^{-1/2}*W*D^{-1/2} (vector)
*
* @return Log probability density of CAR prior up to additive constant
*/
real sparse_car_lpdf(vector phi, real tau, real alpha,
int[,] W_sparse, vector D_sparse, vector lambda, int n, int W_n) {
row_vector[n] phit_D; // phi' * D
row_vector[n] phit_W; // phi' * W
vector[n] ldet_terms;
phit_D = (phi .* D_sparse)';
phit_W = rep_row_vector(0, n);
for (i in 1:W_n) {
phit_W[W_sparse[i, 1]] = phit_W[W_sparse[i, 1]] + phi[W_sparse[i, 2]];
phit_W[W_sparse[i, 2]] = phit_W[W_sparse[i, 2]] + phi[W_sparse[i, 1]];
}
for (i in 1:n) ldet_terms[i] = log1m(alpha * lambda[i]);
return 0.5 * (n * log(tau)
+ sum(ldet_terms)
- tau * (phit_D * phi - alpha * (phit_W * phi)));
}
}
```

with the data transformed in the model:

```
transformed data {
int W_sparse[W_n, 2]; // adjacency pairs
vector[n] D_sparse; // diagonal of D (number of neighbors for each site)
vector[n] lambda; // eigenvalues of invsqrtD * W * invsqrtD
{ // generate sparse representation for W
int counter;
counter = 1;
// loop over upper triangular part of W to identify neighbor pairs
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
if (W[i, j] == 1) {
W_sparse[counter, 1] = i;
W_sparse[counter, 2] = j;
counter = counter + 1;
}
}
}
}
for (i in 1:n) D_sparse[i] = sum(W[i]);
{
vector[n] invsqrtD;
for (i in 1:n) {
invsqrtD[i] = 1 / sqrt(D_sparse[i]);
}
lambda = eigenvalues_sym(quad_form(W, diag_matrix(invsqrtD)));
}
}
```

and the likelihood:

```
model {
phi ~ sparse_car(tau, alpha, W_sparse, D_sparse, lambda, n, W_n);
}
```

This is quite a lot of code to digest, so my general approach is to compare the intermediate steps (whenever possible) with `Stan`

. In this case, I will try to compute `tau, alpha, W_sparse, D_sparse, lambda, n, W_n`

outside of the `Stan`

model in `R`

and compare with my own implementation.

Below is a Sparse CAR implementation in `PyMC3`

(see also here). Again, we try to avoid using any looping, as in `Stan`

.

```
import scipy
class Sparse_CAR(distribution.Continuous):
"""
Sparse Conditional Autoregressive (CAR) distribution
Parameters
----------
alpha : spatial smoothing term
W : adjacency matrix
tau : precision at each location
"""
def __init__(self, alpha, W, tau, *args, **kwargs):
self.alpha = alpha = tt.as_tensor_variable(alpha)
self.tau = tau = tt.as_tensor_variable(tau)
D = W.sum(axis=0)
n, m = W.shape
self.n = n
self.median = self.mode = self.mean = 0
super().__init__(*args, **kwargs)
# eigenvalues of D^−1/2 * W * D^−1/2
Dinv_sqrt = np.diag(1 / np.sqrt(D))
DWD = np.matmul(np.matmul(Dinv_sqrt, W), Dinv_sqrt)
self.lam = scipy.linalg.eigvalsh(DWD)
# sparse representation of W
w_sparse = scipy.sparse.csr_matrix(W)
self.W = theano.sparse.as_sparse_variable(w_sparse)
self.D = tt.as_tensor_variable(D)
# Precision Matrix (inverse of Covariance matrix)
# d_sparse = scipy.sparse.csr_matrix(np.diag(D))
# self.D = theano.sparse.as_sparse_variable(d_sparse)
# self.Phi = self.tau * (self.D - self.alpha*self.W)
def logp(self, x):
logtau = self.n * tt.log(tau)
logdet = tt.log(1 - self.alpha * self.lam).sum()
# tau * ((phi .* D_sparse)' * phi - alpha * (phit_W * phi))
Wx = theano.sparse.dot(self.W, x)
tau_dot_x = self.D * x.T - self.alpha * Wx.ravel()
logquad = self.tau * tt.dot(x.ravel(), tau_dot_x.ravel())
# logquad = tt.dot(x.T, theano.sparse.dot(self.Phi, x)).sum()
return 0.5 * (logtau + logdet - logquad)
```

```
with pm.Model() as model4:
# Vague prior on intercept and effect
beta = pm.Normal("beta", mu=0.0, tau=1.0, shape=(2, 1))
# Priors for spatial random effects
tau = pm.Gamma("tau", alpha=2.0, beta=2.0)
alpha = pm.Uniform("alpha", lower=0, upper=1)
phi = Sparse_CAR("phi", alpha, W, tau, shape=(N, 1))
# Mean model
mu = pm.Deterministic("mu", tt.exp(tt.dot(X, beta) + phi + log_offset))
# Likelihood
Yi = pm.Poisson("Yi", mu=mu.ravel(), observed=O)
infdata4 = pm.sample(1000, tune=2000, cores=4, target_accept=0.85, return_inferencedata=True)
```

```
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [phi, alpha, tau, beta]
```

```
Sampling 4 chains for 2_000 tune and 1_000 draw iterations (8_000 + 4_000 draws total) took 27 seconds.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.
```

```
az.plot_trace(infdata4, var_names=["alpha", "beta", "tau"]);
```

```
az.plot_posterior(infdata4, var_names=["alpha"])
```

```
array([<matplotlib.axes._subplots.AxesSubplot object at 0x7f7ef1e9ebb0>],
dtype=object)
```

As you can see above, the sparse representation returns the same estimates, while being much faster than any other implementation.

## A few other warnings#

In `Stan`

, there is an option to write a `generated quantities`

block for sample generation. Doing the similar in pymc3, however, is not recommended.

Consider the following simple sample:

```
# Data
x = np.array([1.1, 1.9, 2.3, 1.8])
n = len(x)
with pm.Model() as model1:
# prior
mu = pm.Normal('mu', mu=0, tau=.001)
sigma = pm.Uniform('sigma', lower=0, upper=10)
# observed
xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
# generation
p = pm.Deterministic('p', pm.math.sigmoid(mu))
count = pm.Binomial('count', n=10, p=p, shape=10)
```

where we intended to use

```
count = pm.Binomial('count', n=10, p=p, shape=10)
```

to generate posterior prediction. However, if the new RV added to the model is a discrete variable it can cause weird turbulence to the trace. You can see issue #1990 for related discussion.

## Final remarks#

In this notebook, most of the parameter conventions (e.g., using `tau`

when defining a Normal distribution) and choice of priors are strictly matched with the original code in `Winbugs`

or `Stan`

. However, it is important to note that merely porting the code from one probabilistic programming language to the another is not necessarily the best practice. The aim is not just to run the code in `PyMC3`

, but to make sure the model is appropriate so that it returns correct estimates, and runs efficiently (fast sampling).

For example, as @aseyboldt pointed out here and here, non-centered parametrizations are often a better choice than the centered parametrizations. In our case here, `phi`

is following a zero-mean Normal distribution, thus it can be left out in the beginning and used to scale the values afterwards. Often, doing this can avoid correlations in the posterior (it will be slower in some cases, however).

Another thing to keep in mind is that models can be sensitive to choices of prior distributions; for example, you can have a hard time using Normal variables with a large sd as prior. Gelman often recommends Cauchy or StudentT (*i.e.*, weakly-informative priors). More information on prior choice can be found on the Stan wiki.

There are always ways to improve code. Since our computational graph with `pm.Model()`

consist of `theano`

objects, we can always do `print(VAR_TO_CHECK.tag.test_value)`

right after the declaration or computation to check the shape.
For example, in our last example, as suggested by @aseyboldt there seem to be a lot of correlation in the posterior. That probably slows down NUTS quite a bit. As a debugging tool and guide for reparametrization you can look at the singular value decomposition of the standardized samples from the trace – basically the eigenvalues of the correlation matrix. If the problem is high dimensional you can use stuff from `scipy.sparse.linalg`

to only compute the largest singular value:

```
from scipy import linalg, sparse
vals = np.array([model.dict_to_array(v) for v in trace[1000:]]).T
vals[:] -= vals.mean(axis=1)[:, None]
vals[:] /= vals.std(axis=1)[:, None]
U, S, Vh = sparse.linalg.svds(vals, k=20)
```

Then look at `plt.plot(S)`

to see if any principal components are obvious, and check which variables are contributing by looking at the singular vectors: `plt.plot(U[:, -1] ** 2)`

. You can get the indices by looking at `model.bijection.ordering.vmap`

.

Another great way to check the correlations in the posterior is to do a pairplot of the posterior (if your model doesn’t contain too many parameters). You can see quite clearly if and where the the posterior parameters are correlated.

```
az.plot_pair(infdata1, var_names=["beta0", "beta1", "tau_h", "tau_c"], divergences=True);
```

```
az.plot_pair(infdata2, var_names=["beta0", "beta1", "tau_h", "tau_c"], divergences=True);
```

```
az.plot_pair(infdata3, var_names=["beta", "tau", "alpha"], divergences=True);
```

```
az.plot_pair(infdata4, var_names=["beta", "tau", "alpha"], divergences=True);
```

Notebook Written by Junpeng Lao, inspired by

`PyMC3`

issue#2022, issue#2066 and comments. I would like to thank @denadai2, @aseyboldt, and @twiecki for the helpful discussion.

```
%load_ext watermark
%watermark -n -u -v -iv -w
```

```
arviz 0.9.0
theano 1.0.5
pymc3 3.9.3
scipy 1.5.0
pandas 1.0.5
numpy 1.18.5
last updated: Fri Aug 14 2020
CPython 3.8.3
IPython 7.16.1
watermark 2.0.2
```