{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(log-gaussian-cox-process)=\n", "# Modeling spatial point patterns with a marked log-Gaussian Cox process\n", "\n", ":::{post} May 31, 2022\n", ":tags: cox process, latent gaussian process, nonparametric, spatial, count data\n", ":category: intermediate\n", ":author: Chrisopher Krapu, Chris Fonnesbeck\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The log-Gaussian Cox process (LGCP) is a probabilistic model of point patterns typically observed in space or time. It has two main components. First, an underlying *intensity* field $\\lambda(s)$ of positive real values is modeled over the entire domain $X$ using an exponentially-transformed Gaussian process which constrains $\\lambda$ to be positive. Then, this intensity field is used to parameterize a [Poisson point process](https://en.wikipedia.org/wiki/Poisson_point_process) which represents a stochastic mechanism for placing points in space. Some phenomena amenable to this representation include the incidence of cancer cases across a county, or the spatiotemporal locations of crime events in a city. Both spatial and temporal dimensions can be handled equivalently within this framework, though this tutorial only addresses data in two spatial dimensions.\n", "\n", "In more formal terms, if we have a space $X$ and $A\\subseteq X$, the distribution over the number of points $Y_A$ occurring within subset $A$ is given by\n", "$$Y_A \\sim Poisson\\left(\\int_A \\lambda(s) ds\\right)$$\n", "and the intensity field is defined as\n", "$$\\log \\lambda(s) \\sim GP(\\mu(s), K(s,s'))$$\n", "where $GP(\\mu(s), K(s,s'))$ denotes a Gaussian process with mean function $\\mu(s)$ and covariance kernel $K(s,s')$ for a location $s \\in X$. This is one of the simplest models of point patterns of $n$ events recorded as locations $s_1,...,s_n$ in an arbitrary metric space. In conjunction with a Bayesian analysis, this model can be used to answering questions of interest such as:\n", "* Does an observed point pattern imply a statistically significant shift in spatial intensities?\n", "* What would randomly sampled patterns with the same statistical properties look like?\n", "* Is there a statistical correlation between the *frequency* and *magnitude* of point events?\n", "\n", "In this notebook, we'll use a grid-based approximation to the full LGCP with PyMC to fit a model and analyze its posterior summaries. We will also explore the usage of a marked Poisson process, an extension of this model to account for the distribution of *marks* associated with each data point." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our observational data concerns 231 sea anemones whose sizes and locations on the French coast were recorded. This data was taken from the [spatstat spatial modeling package in R](https://github.com/spatstat/spatstat) which is designed to address models like the LGCP and its subsequent refinements. The original source of this data is the textbook *Spatial data analysis by example* by Upton and Fingleton (1985) and a longer description of the data can be found there." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "from itertools import product\n", "\n", "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import pymc as pm\n", "\n", "from matplotlib import MatplotlibDeprecationWarning\n", "from numpy.random import default_rng\n", "\n", "warnings.filterwarnings(action=\"ignore\", category=MatplotlibDeprecationWarning)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%config InlineBackend.figure_format = 'retina'\n", "az.style.use(\"arviz-darkgrid\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv(pm.get_data(\"anemones.csv\"))\n", "n = data.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset has coordinates and discrete mark values for each anemone. While these marks are integers, for the sake of simplicity we will model these values as continuous in a later step." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xymarks
02776
119754
274154
\n", "