pymc.OrderedProbit#

class pymc.OrderedProbit(name, eta, cutpoints, sigma=1, compute_p=True, **kwargs)[source]#

Ordered Probit distributions.

Useful for regression on ordinal data values whose values range from 1 to K as a function of some predictor, \(\eta\). The cutpoints, \(c\), separate which ranges of \(\eta\) are mapped to which of the K observed dependent variables. The number of cutpoints is K - 1. It is recommended that the cutpoints are constrained to be ordered.

In order to stabilize the computation, log-likelihood is computed in log space using the scaled error function erfcx.

\[\begin{split}f(k \mid \eta, c) = \left\{ \begin{array}{l} 1 - \text{normal_cdf}(0, \sigma, \eta - c_1) \,, \text{if } k = 0 \\ \text{normal_cdf}(0, \sigma, \eta - c_{k - 1}) - \text{normal_cdf}(0, \sigma, \eta - c_{k}) \,, \text{if } 0 < k < K \\ \text{normal_cdf}(0, \sigma, \eta - c_{K - 1}) \,, \text{if } k = K \\ \end{array} \right.\end{split}\]
Parameters:
etatensor_like of float

The predictor.

cutpointstensor_like array of floats

The length K - 1 array of cutpoints which break \(\eta\) into ranges. Do not explicitly set the first and last elements of \(c\) to negative and positive infinity.

sigmatensor_like of float, default 1.0

Standard deviation of the probit function.

compute_pbool, default True

Whether to compute and store in the trace the inferred probabilities of each categories, based on the cutpoints’ values. Defaults to True. Might be useful to disable it if memory usage is of interest.

Examples

# Generate data for a simple 1 dimensional example problem
n1_c = 300; n2_c = 300; n3_c = 300
cluster1 = np.random.randn(n1_c) + -1
cluster2 = np.random.randn(n2_c) + 0
cluster3 = np.random.randn(n3_c) + 2

x = np.concatenate((cluster1, cluster2, cluster3))
y = np.concatenate((1*np.ones(n1_c),
                    2*np.ones(n2_c),
                    3*np.ones(n3_c))) - 1

# Ordered probit regression
with pm.Model() as model:
    cutpoints = pm.Normal("cutpoints", mu=[-1,1], sigma=10, shape=2,
                          transform=pm.distributions.transforms.ordered)
    y_ = pm.OrderedProbit("y", cutpoints=cutpoints, eta=x, observed=y)
    idata = pm.sample()

# Plot the results
plt.hist(cluster1, 30, alpha=0.5);
plt.hist(cluster2, 30, alpha=0.5);
plt.hist(cluster3, 30, alpha=0.5);
posterior = idata.posterior.stack(sample=("chain", "draw"))
plt.hist(posterior["cutpoints"][0], 80, alpha=0.2, color='k');
plt.hist(posterior["cutpoints"][1], 80, alpha=0.2, color='k');

Methods

OrderedProbit.dist(eta, cutpoints[, sigma])