Posts by Christopher Krapu

Modeling spatial point patterns with a marked log-Gaussian Cox process

The log-Gaussian Cox process (LGCP) is a probabilistic model of point patterns typically observed in space or time. It has two main components. First, an underlying intensity field \(\lambda(s)\) of positive real values is modeled over the entire domain \(X\) using an exponentially-transformed Gaussian process which constrains \(\lambda\) to be positive. Then, this intensity field is used to parameterize a Poisson point process which represents a stochastic mechanism for placing points in space. Some phenomena amenable to this representation include the incidence of cancer cases across a county, or the spatiotemporal locations of crime events in a city. Both spatial and temporal dimensions can be handled equivalently within this framework, though this tutorial only addresses data in two spatial dimensions.

Read more ...


Kronecker Structured Covariances

PyMC contains implementations for models that have Kronecker structured covariances. This patterned structure enables Gaussian process models to work on much larger datasets. Kronecker structure can be exploited when

Read more ...


Factor analysis

Factor analysis is a widely used probabilistic model for identifying low-rank structure in multivariate data as encoded in latent variables. It is very closely related to principal components analysis, and differs only in the prior distributions assumed for these latent variables. It is also a good example of a linear Gaussian model as it can be described entirely as a linear transformation of underlying Gaussian variates. For a high-level view of how factor analysis relates to other models, you can check out this diagram originally published by Ghahramani and Roweis.

Read more ...


Using a custom step method for sampling from locally conjugate posterior distributions

Markov chain Monte Carlo (MCMC) sampling methods are fundamental to modern Bayesian inference. PyMC leverages Hamiltonian Monte Carlo (HMC), a powerful sampling algorithm that efficiently explores high-dimensional posterior distributions. Unlike simpler MCMC methods, HMC harnesses the gradient of the log posterior density to make intelligent proposals, allowing it to effectively sample complex posteriors with hundreds or thousands of parameters. A key advantage of HMC is its generality - it works with arbitrary prior distributions and likelihood functions, without requiring conjugate pairs or closed-form solutions. This is crucial since most real-world models involve priors and likelihoods whose product cannot be analytically integrated to obtain the posterior distribution. HMC’s gradient-guided proposals make it dramatically more efficient than earlier MCMC approaches that rely on random walks or simple proposal distributions.

Read more ...