{ "cells": [ { "cell_type": "markdown", "id": "f2e8530c-5ba0-4041-a309-18919d5d0533", "metadata": {}, "source": [ "(regression_discontinuity)=\n", "# Regression discontinuity design analysis\n", "\n", ":::{post} April, 2022\n", ":tags: regression, causal inference, quasi experiments, counterfactuals \n", ":category: beginner, explanation\n", ":author: Benjamin T. Vincent\n", ":::\n", "\n", "[Quasi experiments](https://en.wikipedia.org/wiki/Quasi-experiment) involve experimental interventions and quantitative measures. However, quasi-experiments do _not_ involve random assignment of units (e.g. cells, people, companies, schools, states) to test or control groups. This inability to conduct random assignment poses problems when making causal claims as it makes it harder to argue that any difference between a control and test group are because of an intervention and not because of a confounding factor.\n", "\n", "The [regression discontinuity design](https://en.wikipedia.org/wiki/Regression_discontinuity_design) is a particular form of quasi experimental design. It consists of a control and test group, but assignment of units to conditions is chosen based upon a threshold criteria, not randomly. \n", "\n", ":::{figure-md} fig-target\n", "\n", "![regression discontinuity design schematic](regression_discontinuity.png)\n", "\n", "A schematic diagram of the regression discontinuity design. The dashed green line shows where we would have expected the post test scores of the test group to be if they had not received the treatment. Image taken from [https://conjointly.com/kb/regression-discontinuity-design/](https://conjointly.com/kb/regression-discontinuity-design/).\n", ":::\n", "\n", "Units with very low scores are likely to differ systematically along some dimensions than units with very high scores. For example, if we look at students who achieve the highest, and students who achieve the lowest, in all likelihood there are confounding variables. Students with high scores are likely to have come from more privileged backgrounds than those with the lowest scores. \n", "\n", "If we gave extra tuition (our experimental intervention) to students scoring in the lowest half of scores then we can easily imagine that we have large differences in some measure of privilege between test and control groups. At a first glance, this would seem to make the regression discontinuity design useless - the whole point of random assignment is to reduce or eliminate systematic biases between control and test groups. But use of a threshold would seem to maximise the differences in confounding variables between groups. Isn't this an odd thing to do?\n", "\n", "The key point however is that it is much less likely that students scoring just below and just above the threshold systematically differ in their degree of privilege. And so _if_ we find evidence of a meaningful discontinuity in a post-test score in those just above and just below the threshold, then it is much more plausible that the intervention (applied according to the threshold criteria) was causally responsible.\n", "\n", "## Sharp v.s. fuzzy regression discontinuity designs\n", "Note that regression discontinuity designs fall into two categories. This notebook focuses on _sharp_ regression discontinuity designs, but it is important to understand both sharp and fuzzy variants:\n", "\n", "- **Sharp:** Here, the assignment to control or treatment groups is purely dictated by the threshold. There is no uncertainty in which units are in which group.\n", "- **Fuzzy:** In some situations there may not be a sharp boundary between control and treatment based upon the threshold. This could happen for example if experimenters are not strict in assigning units to groups based on the threshold. Alternatively, there could be non-compliance on the side of the actual units being studied. For example, patients may not always be fully compliant in taking medication, so some unknown proportion of patients assigned to the test group may actually be in the control group because of non compliance." ] }, { "cell_type": "code", "execution_count": 1, "id": "efb41c68-2dbc-4f70-b333-eef4c743994a", "metadata": {}, "outputs": [], "source": [ "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import pymc as pm" ] }, { "cell_type": "code", "execution_count": 2, "id": "5403941e-6a30-4f93-8533-e219805b2c3c", "metadata": {}, "outputs": [], "source": [ "RANDOM_SEED = 123\n", "rng = np.random.default_rng(RANDOM_SEED)\n", "az.style.use(\"arviz-darkgrid\")\n", "%config InlineBackend.figure_format = 'retina'" ] }, { "cell_type": "markdown", "id": "9fcbc99e-0bd0-4763-97ec-e4ac0114aefe", "metadata": {}, "source": [ "## Generate simulated data\n", "Note that here we assume that there is negligible/zero measurement noise, but that there is some variability in the true values from pre- to post-test. It is possible to take into account measurement noise on the pre- and post-test results, but we do not engage with that in this notebook." ] }, { "cell_type": "code", "execution_count": 3, "id": "4468db37-fe9e-43b6-9779-2dc55e7e20e1", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", " | x | \n", "treated | \n", "y | \n", "
---|---|---|---|
0 | \n", "-0.989121 | \n", "True | \n", "0.050794 | \n", "
1 | \n", "-0.367787 | \n", "True | \n", "-0.181418 | \n", "
2 | \n", "1.287925 | \n", "False | \n", "1.345912 | \n", "
3 | \n", "0.193974 | \n", "False | \n", "0.430915 | \n", "
4 | \n", "0.920231 | \n", "False | \n", "1.229825 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "-1.246726 | \n", "True | \n", "-0.819665 | \n", "
996 | \n", "0.090428 | \n", "False | \n", "0.006909 | \n", "
997 | \n", "0.370658 | \n", "False | \n", "-0.027703 | \n", "
998 | \n", "-1.063268 | \n", "True | \n", "0.008132 | \n", "
999 | \n", "0.239116 | \n", "False | \n", "0.604780 | \n", "
1000 rows × 3 columns
\n", "