{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dependent density regression\n", "In another [example](dp_mix.ipynb), we showed how to use Dirichlet processes to perform Bayesian nonparametric density estimation. This example expands on the previous one, illustrating dependent density regression.\n", "\n", "Just as Dirichlet process mixtures can be thought of as infinite mixture models that select the number of active components as part of inference, dependent density regression can be thought of as infinite [mixtures of experts](https://en.wikipedia.org/wiki/Committee_machine) that select the active experts as part of inference. Their flexibility and modularity make them powerful tools for performing nonparametric Bayesian Data analysis." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on PyMC3 v3.11.2\n" ] } ], "source": [ "import arviz as az\n", "import numpy as np\n", "import pandas as pd\n", "import pymc3 as pm\n", "import seaborn as sns\n", "\n", "from IPython.display import HTML\n", "from matplotlib import animation as ani\n", "from matplotlib import pyplot as plt\n", "from theano import tensor as tt\n", "\n", "print(f\"Running on PyMC3 v{pm.__version__}\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%config InlineBackend.figure_format = 'retina'\n", "plt.rc(\"animation\", writer=\"ffmpeg\")\n", "blue, *_ = sns.color_palette()\n", "az.style.use(\"arviz-darkgrid\")\n", "SEED = 972915 # from random.org; for reproducibility\n", "np.random.seed(SEED)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the LIDAR data set from Larry Wasserman's excellent book, [_All of Nonparametric Statistics_](http://www.stat.cmu.edu/~larry/all-of-nonpar/). We standardize the data set to improve the rate of convergence of our samples." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "DATA_URI = \"http://www.stat.cmu.edu/~larry/all-of-nonpar/=data/lidar.dat\"\n", "\n", "\n", "def standardize(x):\n", " return (x - x.mean()) / x.std()\n", "\n", "\n", "df = pd.read_csv(DATA_URI, sep=r\"\\s{1,3}\", engine=\"python\").assign(\n", " std_range=lambda df: standardize(df.range), std_logratio=lambda df: standardize(df.logratio)\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | range | \n", "logratio | \n", "std_range | \n", "std_logratio | \n", "
---|---|---|---|---|
0 | \n", "390 | \n", "-0.050356 | \n", "-1.717725 | \n", "0.852467 | \n", "
1 | \n", "391 | \n", "-0.060097 | \n", "-1.707299 | \n", "0.817981 | \n", "
2 | \n", "393 | \n", "-0.041901 | \n", "-1.686447 | \n", "0.882398 | \n", "
3 | \n", "394 | \n", "-0.050985 | \n", "-1.676020 | \n", "0.850240 | \n", "
4 | \n", "396 | \n", "-0.059913 | \n", "-1.655168 | \n", "0.818631 | \n", "