Gaussian Processes: Latent Variable Implementation#

The gp.Latent class is a direct implementation of a Gaussian process without approximation. Given a mean and covariance function, we can place a prior on the function \(f(x)\),

\[ f(x) \sim \mathcal{GP}(m(x),\, k(x, x')) \,. \]

It is called “Latent” because the GP itself is included in the model as a latent variable, it is not marginalized out as is the case with gp.Marginal. Unlike gp.Latent, you won’t find samples from the GP posterior in the trace with gp.Marginal. This is the most direct implementation of a GP because it doesn’t assume a particular likelihood function or structure in the data or in the covariance matrix.

The `.prior` method#

The prior method adds a multivariate normal prior distribution to the PyMC model over the vector of GP function values, \(\mathbf{f}\),

\[ \mathbf{f} \sim \text{MvNormal}(\mathbf{m}_{x},\, \mathbf{K}_{xx}) \,, \]

where the vector \(\mathbf{m}_x\) and the matrix \(\mathbf{K}_{xx}\) are the mean vector and covariance matrix evaluated over the inputs \(x\). By default, PyMC reparameterizes the prior on f under the hood by rotating it with the Cholesky factor of its covariance matrix. This improves sampling by reducing covariances in the posterior of the transformed random variable, v. The reparameterized model is,

\[\begin{split} \begin{aligned} \mathbf{v} \sim \text{N}(0, 1)& \\ \mathbf{L} = \text{Cholesky}(\mathbf{K}_{xx})& \\ \mathbf{f} = \mathbf{m}_{x} + \mathbf{Lv} \\ \end{aligned} \end{split}\]

For more information on this reparameterization, see the section on drawing values from a multivariate distribution.

The `.conditional` method#

The conditional method implements the predictive distribution for function values that were not part of the original data set. This distribution is,

\[ \mathbf{f}_* \mid \mathbf{f} \sim \text{MvNormal} \left( \mathbf{m}_* + \mathbf{K}_{*x}\mathbf{K}_{xx}^{-1} \mathbf{f} ,\, \mathbf{K}_{**} - \mathbf{K}_{*x}\mathbf{K}_{xx}^{-1}\mathbf{K}_{x*} \right) \]

Using the same gp object we defined above, we can construct a random variable with this distribution by,

# vector of new X points we want to predict the function at
X_star = np.linspace(0, 2, 100)[:, None]

with latent_gp_model:
    f_star = gp.conditional("f_star", X_star)

Example 2: Classification#

First we use a GP to generate some data that follows a Bernoulli distribution, where \(p\), the probability of a one instead of a zero is a function of \(x\). I reset the seed and added more fake data points, because it can be difficult for the model to discern variations around 0.5 with few observations.

# reset the random seed for the new example
RANDOM_SEED = 8888
rng = np.random.default_rng(RANDOM_SEED)

# number of data points
n = 300

# x locations
x = np.linspace(0, 10, n)

# true covariance
ell_true = 0.5
eta_true = 1.0
cov_func = eta_true**2 * pm.gp.cov.ExpQuad(1, ell_true)
K = cov_func(x[:, None]).eval()

# zero mean function
mean = np.zeros(n)

# sample from the gp prior
f_true = pm.draw(pm.MvNormal.dist(mu=mean, cov=K, method="svd"), 1, random_seed=rng)

# Sample the GP through the likelihood
y = pm.Bernoulli.dist(p=pm.math.invlogit(f_true)).eval()

fig = plt.figure(figsize=(10, 4))
ax = fig.gca()

ax.plot(x, pm.math.invlogit(f_true).eval(), "dodgerblue", lw=3, label="True rate")
# add some noise to y to make the points in the plot more visible
ax.plot(x, y + np.random.randn(n) * 0.01, "kx", ms=6, label="Observed data")

ax.set_xlabel("X")
ax.set_ylabel("y")
ax.set_xlim([0, 11])
plt.legend(loc=(0.35, 0.65), frameon=True);

../_images/b12b91948632605822a58d1bc67feedbf3a23caaa44eeb9a22aef00e21abaca0.png

with pm.Model() as model:
    ell = pm.InverseGamma("ell", mu=1.0, sigma=0.5)
    eta = pm.Exponential("eta", lam=1.0)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, ell)

    gp = pm.gp.Latent(cov_func=cov)
    f = gp.prior("f", X=x[:, None])

    # logit link and Bernoulli likelihood
    p = pm.Deterministic("p", pm.math.invlogit(f))
    y_ = pm.Bernoulli("y", p=p, observed=y)

    idata = pm.sample(1000, chains=2, cores=2)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [ell, eta, f_rotated_]

Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 798 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics

# check Rhat, values above 1 may indicate convergence issues
n_nonconverged = int(np.sum(az.rhat(idata)[["eta", "ell", "f_rotated_"]].to_array() > 1.03).values)
if n_nonconverged == 0:
    print("No Rhat values above 1.03, \N{check mark}")
else:
    print(f"The MCMC chains for {n_nonconverged} RVs appear not to have converged.")

No Rhat values above 1.03, ✓

ax = az.plot_pair(
    idata,
    var_names=["eta", "ell"],
    kind=["kde", "scatter"],
    scatter_kwargs={"color": "darkslategray", "alpha": 0.4},
    gridsize=25,
    divergences=True,
)

ax.axvline(x=eta_true, color="dodgerblue")
ax.axhline(y=ell_true, color="dodgerblue");

../_images/e98b3ad5f19b25a4bcce5cc9916d5f44db16e5a1f2c16c8163b4530bc2161acc.png

n_pred = 200
X_new = np.linspace(0, 12, n_pred)[:, None]

with model:
    f_pred = gp.conditional("f_pred", X_new, jitter=1e-4)
    p_pred = pm.Deterministic("p_pred", pm.math.invlogit(f_pred))

with model:
    idata.extend(pm.sample_posterior_predictive(idata.posterior, var_names=["f_pred", "p_pred"]))

Sampling: [f_pred]

# plot the results
fig = plt.figure(figsize=(10, 4))
ax = fig.gca()

# plot the samples from the gp posterior with samples and shading
p_pred = az.extract(idata.posterior_predictive, var_names="p_pred").transpose("sample", ...)
plot_gp_dist(ax, p_pred, X_new)

# plot the data (with some jitter) and the true latent function
plt.plot(x, pm.math.invlogit(f_true).eval(), "dodgerblue", lw=3, label="True f")
plt.plot(
    x,
    y + np.random.randn(y.shape[0]) * 0.01,
    "kx",
    ms=6,
    alpha=0.5,
    label="Observed data",
)

# axis labels and title
plt.xlabel("X")
plt.ylabel("True f(x)")
plt.xlim([0, 12])
plt.title("Posterior distribution over $f(x)$ at the observed values")
plt.legend(loc=(0.32, 0.65), frameon=True);

../_images/48de4b071d1ff61ce96082dd8410eecc806f9032f6363888d2207aefb9d3d2aa.png

Authors#

Created by Bill Engels in 2017 (pymc#1674)
Reexecuted by Colin Caroll in 2019 (pymc#3397)
Updated for V4 by Bill Engels in September 2022 (pymc-examples#237)
Updated for V5 by Chris Fonnesbeck in July 2023 (pymc-examples#549)
Updated by Alexandre Andorra in May 2024

Watermark#

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,xarray

Last updated: Sun Apr 13 2025

Python implementation: CPython
Python version       : 3.12.10
IPython version      : 9.1.0

pytensor: 2.30.3+3.g0e7e6d77f
xarray  : 2025.3.1

numpy     : 2.2.4
pymc      : 5.21.2+4.g2842401f9
arviz     : 0.21.0
matplotlib: 3.10.1

Watermark: 2.5.0

License notice#

All the notebooks in this example gallery are provided under the MIT License which allows modification, and redistribution for any use provided the copyright and license notices are preserved.

Citing PyMC examples#

To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.

Important

Many notebooks are adapted from other sources: blogs, books… In such cases you should cite the original source as well.

Also remember to cite the relevant libraries used by your code.

Here is an citation template in bibtex:

@incollection{citekey,
  author    = "<notebook authors, see above>",
  title     = "<notebook title>",
  editor    = "PyMC Team",
  booktitle = "PyMC examples",
  doi       = "10.5281/zenodo.5654871"
}

which once rendered could look like:

Categories

Tags

Gaussian Processes: Latent Variable Implementation#

The `.prior` method#

The `.conditional` method#

Example 1: Regression with Student-T distributed noise#

Coding the model in PyMC#

Results#

Prediction using `.conditional`#

Example 2: Classification#

Authors#

Watermark#

License notice#

Citing PyMC examples#

Categories

Tags

Gaussian Processes: Latent Variable Implementation#

The .prior method#

The .conditional method#

Example 1: Regression with Student-T distributed noise#

Coding the model in PyMC#

Results#

Prediction using .conditional#

Example 2: Classification#

Authors#

Watermark#

License notice#

Citing PyMC examples#

The `.prior` method#

The `.conditional` method#

Prediction using `.conditional`#