Distribution Dimensionality#

PyMC provides a number of ways to specify the dimensionality of its distributions. This document provides an overview, and offers some user tips.

Glossary#

In this document we’ll be using the term dimensionality to refer to the idea of dimensions. Each of the terms below has a specific semantic and computational definition in PyMC. While we share them here they will make much more sense when viewed in the examples below.

  • Support dimensions → The core dimensionality of a distribution

  • Batch dimensions → Extra dimensions beyond the support dimensionality of a distribution

  • Implicit dimensions → Dimensions that follow from the values or shapes of the distribution parameters

  • Explicit dimensions → Dimensions that are explicitly defined by one of the following arguments:

    • Shape → Number of draws from a distribution

    • Dims → An array of dimension names

  • Coords → A dictionary mapping dimension names to coordinate values

from functools import partial

import pymc as pm
import numpy as np
import aesara.tensor as at

Univariate distribution example#

We can start with the simplest case, a single Normal distribution. We use .dist to specify one outside of a PyMC Model.

normal_dist = pm.Normal.dist()

We can then use the draw() function to take a random draw from that distribution.

# Just patching the draw function for reproducibility
rng = np.random.default_rng(seed=sum(map(ord, "dimensionality")))
draw = partial(pm.draw, random_seed=rng)
normal_draw = draw(normal_dist)
normal_draw, normal_draw.ndim
(array(0.80189558), 0)

In this case we end up with a single scalar value. This means that a Normal distribution has a scalar support dimensionality, as the smallest random draw you can take is a scalar which has a dimension of zero. The support dimensionality of every distribution is hard-coded as a property.

normal_dist.owner.op.ndim_supp
0

Explicit batch dimensions#

If one needs more than a single draw, a natural tendency would be to create multiple copies of the same variable and stack them together.

normal_dists = pm.math.stack([pm.Normal.dist() for _ in range(3)])
draw(normal_dists)
array([ 0.9434115 , -0.33327414,  0.83636296])

More simply, one can create a batch of independent draws from the same distribution family by using the shape argument.

normal_dists = pm.Normal.dist(shape=(3,))
draw(normal_dists)
array([ 0.98810294, -0.07003785, -0.37962748])
normal_dists = pm.Normal.dist(shape=(4, 3))
draw(normal_dists)
array([[ 7.99932116e-04, -1.94407945e+00,  3.90451962e-01],
       [ 1.10657367e+00,  6.49042149e-01, -1.09450185e+00],
       [-2.96226305e-01,  1.41884595e+00, -1.31589441e+00],
       [ 1.53109449e+00, -7.73771737e-01,  2.37418367e+00]])

Not only is this more succint, but it produces much more efficient vectorized code. We rarely use the stack approach in PyMC, unless we need to combine draws from distinct distribution families.

Implicit batch dimensions#

It is also possible to create a batch of draws by passing parameters with higher dimensions, without having to specify shape.

normal_dists = pm.Normal.dist(mu=np.array([0, 0, 0]), sigma=np.array([1, 1, 1]))
draw(normal_dists)
array([ 0.81833093, -0.2891973 ,  1.2399946 ])

This is equivalent to the previous example with explicit shape, and we could have passed it explicitly here. Because we did not, we refer to these batch dimensions as being implicit.

Where this becomes very useful is when we want the parameters to vary across batch dimensions.

draw(pm.Normal.dist(mu=[1, 10, 100], sigma=0.0001))
array([  0.99989975,  10.00009874, 100.00004215])

When the parameters don’t have the same shapes, they are broacasted, in a similar way to how NumPy works. In this case sigma was broadcasted to match the shape of mu.

np.broadcast_arrays([1, 10, 100], 0.0001)
[array([  1,  10, 100]), array([0.0001, 0.0001, 0.0001])]

It’s important to understand how NumPy broadcasting works. When you do something that is not valid, you will easily encounter this sort of errors:

try:
    # shapes of (3,) and (2,) can't be broadcasted together
    draw(pm.Normal.dist(mu=[1, 10, 100], sigma=[0.1, 0.1]))
except ValueError as error:
    print(error)
shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (3,) and arg 1 with shape (2,).
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F975A739FC0>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{[  1  10 100]}, TensorConstant{(2,) of 0.1})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(float64, (2,))]
Inputs shapes: ['No shapes', (0,), (), (3,), (2,)]
Inputs strides: ['No strides', (0,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7F975A739FC0, array([], dtype=int64), array(11), array([  1,  10, 100]), array([0.1, 0.1])]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Combining implicit and explicit batch dimensions#

You can combine explicit shape dimensions with implicit batch dimensions. As mentioned above, they can provide the same information.

normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3,))
draw(normal_dists)
array([-0.49526775, -0.94608062,  1.66397913])

But shape can also be used to extend beyond any implicit batch dimensions.

normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3))
draw(normal_dists)
array([[ 2.22626513,  2.12938134,  0.49074886],
       [ 0.08312601,  1.05049093,  1.91718083],
       [-0.68191815,  1.43771096,  1.76780399],
       [-0.59883241,  0.26954893,  2.74319335]])

Note that, due to broadcasting rules, explicit batch dimensions must always “go on the left” of any implicit dimensions. So in the previous example shape=(4, 3) is valid, but shape=(3, 4) is not, because the mu parameter can be broadcasted to the first shape but not to the second.

try:
    draw(pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3, 4)))
except ValueError as error:
    print(error)
shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (3, 4) and arg 1 with shape (3,).
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F975A579A80>), TensorConstant{[3 4]}, TensorConstant{11}, TensorConstant{[0 1 2]}, TensorConstant{1.0})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (2,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(float64, ())]
Inputs shapes: ['No shapes', (2,), (), (3,), ()]
Inputs strides: ['No strides', (8,), (), (8,), ()]
Inputs values: [Generator(PCG64) at 0x7F975A579A80, array([3, 4]), array(11), array([0, 1, 2]), array(1.)]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

If you needed the Normal variables to have shape=(4, 3), you can transpose it after defining it.

transposed_normals = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3)).T
draw(transposed_normals)
array([[-0.73397401, -0.18717845, -0.78548049,  1.64478883],
       [ 3.54543846,  1.22954216,  2.13674063,  1.94194106],
       [ 0.85294471,  3.52041332,  2.94428975,  3.25944187]])

Tip

It’s important not to confuse dimensions set in the definition of a distribution versus those set in downstream manipulations like transposition, indexing or broadcasting. When sampling with PyMC (be it via forward sampling or MCMC), the random draws will always emanate from the distribution shape. Notice how in the following example, a different number of “random” draws were actually taken, despite the two variables having the same final shape.

vector_normals = pm.Normal.dist(size=(3,))
broadcasted_normal = at.broadcast_to(pm.Normal.dist(), (3,))
draw(vector_normals), draw(broadcasted_normal)
(array([-0.45755879,  1.59975702,  0.20546749]),
 array([0.29866199, 0.29866199, 0.29866199]))

Multivariate distribution example#

Some distributions by definition return more than one value when evaluated. This may be a vector of values or a matrix or an arbitrary multidimensional tensor. An example is the Multivariate Normal, which always returns a vector (an array with one dimension).

mvnormal_dist = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3))
mvnormal_draw = draw(mvnormal_dist)
mvnormal_draw, mvnormal_draw.ndim
(array([0.55390975, 2.17440418, 1.83014764]), 1)

As with any distribution, the support dimensionality is specified as a fixed property

mvnormal_dist.owner.op.ndim_supp
1

Even if you specify a MvNormal with a single dimension, you get back a vector!

smallest_mvnormal_dist = pm.MvNormal.dist(mu=[1], cov=[[1]])
smallest_mvnormal_draw = draw(smallest_mvnormal_dist)
smallest_mvnormal_draw, smallest_mvnormal_draw.ndim
(array([-0.68893796]), 1)

Implicit support dimensions#

In the MvNormal examples we just saw, the support dimension was actually implicit. Nowhere did we specify we wanted a vector of 3 or 1 draws. This was inferred from the shape of mu and cov. As such, we refer to it as being an implicit support dimension. We could be a bit more explicit by using shape.

explicit_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(3,))
draw(explicit_mvnormal)
array([0.57262853, 0.34230354, 1.96818163])

Warning

However, note that at the time of writing shape is simply ignored for support dimensions. It serves merely as a “type-hint” for labeling the expected dimensions.

ignored_shape_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(4,))
draw(ignored_shape_mvnormal)
array([1.0623799 , 0.84622693, 0.34046237])

Explicit batch dimensions#

As with univariate distributions, we can add explicit batched dimensions. We will use another vector distribution to illustrate this: the Multinomial. The following snippet defines a matrix of five independent Multinomial distributions, each of which is a vector of size 3.

draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)))
array([[2, 0, 3],
       [1, 1, 3],
       [0, 2, 3],
       [0, 1, 4],
       [1, 0, 4]])

Warning

Again, note that shape has no effect on the support dimensionality

draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 4)))
array([[0, 1, 4],
       [0, 0, 5],
       [3, 1, 1],
       [0, 1, 4],
       [0, 2, 3]])

For the same reason, you must always define explicit batched dimensions “to the left” of the support dimension. The following will not behave as expected.

draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(3, 5)))
array([[2, 0, 3],
       [1, 3, 1],
       [1, 1, 3]])

If you needed the Multinomial variables to have shape=(3, 5) you can transpose it after defining it.

transposed_multinomials = pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)).T
draw(transposed_multinomials)
array([[0, 0, 0, 0, 0],
       [2, 2, 1, 0, 3],
       [3, 3, 4, 5, 2]])

Implicit batch dimensions#

As with univariate distributions, we can use different parameters for each batched dimension

multinomial_dist = pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6])
draw(multinomial_dist)
array([[1, 2, 2],
       [0, 3, 7]])

Which is equivalent to the more verbose

draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))
array([[2, 2, 1],
       [0, 3, 7]])

If you are familiar with NumPy broadcasting rules you may be curious of how does PyMC make this work. Naive broadcasting wouldn’t work here

try:
    np.broadcast_arrays([5, 10], [0.1, 0.3, 0.6])
except ValueError as exc:
    print(exc)
shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (2,) and arg 1 with shape (3,).

To understand what is going on, we need to introduce the concept of parameter core dimensions. The core dimensions of a distribution’s parameter are the minimum number of dimensions the parameters need to have in order to define a distribution. In the Multinomial distribution, n must at least be an scalar integer, but p must be at least a vector that represents the probability of having an outcome on each category. So, for the Multinomial distribution, n has 0 core dimensions, and p has 1 core dimension.

So if we have a vector of two n, we should actually broadcast the vector of p into a matrix with two such vectors, and pair each n with each broadcasted row of p. This works exactly like np.vectorize.

def core_multinomial(n, p):
    print(">>", n, p)
    return draw(pm.Multinomial.dist(n, p))


vectorized_multinomial = np.vectorize(core_multinomial, signature="(),(p)->(p)")
vectorized_multinomial([5, 10], [0.1, 0.3, 0.6])
>> 5 [0.1 0.3 0.6]
>> 10 [0.1 0.3 0.6]
array([[1, 0, 4],
       [1, 2, 7]])

The core dimensionality of each distribution parameter is also hard-coded as a property of each distribution

multinomial_dist.owner.op.ndims_params
(0, 1)

Implicit batch dimensions must still respect broadcasting rules. The following example is not valid because n has batched dimensions of shape=(2,) and p has batched dimensions of shape=(3,) which cannot be broadcasted together.

try:
    draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))
except ValueError as error:
    print(error)
operands could not be broadcast together with remapped shapes [original->remapped]: (2,)  and requested shape (3,)
Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F975995F760>), TensorConstant{[]}, TensorConstant{4}, TensorConstant{[ 5 10]}, TensorConstant{[[0.1 0.3 .. 0.3 0.6]]})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (2,)), TensorType(float64, (3, 3))]
Inputs shapes: ['No shapes', (0,), (), (2,), (3, 3)]
Inputs strides: ['No strides', (0,), (), (8,), (24, 8)]
Inputs values: [Generator(PCG64) at 0x7F975995F760, array([], dtype=int64), array(4), array([ 5, 10]), 'not shown']
Outputs clients: [['output'], ['output']]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Combining implicit and explicit batch dimensions#

You can and should combine implicit dimensions from multidimensional parameters with explicit shape information, which is easier to reason about.

draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 3)))
array([[0, 1, 4],
       [4, 1, 5]])

Explicit batch dimensions can still extend beyond any implicit batch dimensions. Again, due to how broadcasting works, explicit batch dimensions must always “go on the left”. The following case is invalid, because n has batched dimensions of shape=(2,), which cannot be broadcasted to the explicit batch dimensions of shape=(2, 4).

try:
    draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 4, 3)))
except ValueError as error:
    print(error)
operands could not be broadcast together with remapped shapes [original->remapped]: (2,)  and requested shape (2,4)
Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F9759763D80>), TensorConstant{[2 4]}, TensorConstant{4}, TensorConstant{[ 5 10]}, TensorConstant{[0.1 0.3 0.6]})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (2,)), TensorType(int64, ()), TensorType(int64, (2,)), TensorType(float64, (3,))]
Inputs shapes: ['No shapes', (2,), (), (2,), (3,)]
Inputs strides: ['No strides', (8,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7F9759763D80, array([2, 4]), array(4), array([ 5, 10]), array([0.1, 0.3, 0.6])]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Inspecting dimensionality with a model graph#

More often than not distributions are used inside a PyMC model, and as such there are tools that facilitate reasoning about distributions shapes in that context.

with pm.Model() as pmodel:
    mu = pm.Normal("x", mu=0, size=(3))
    sigma = pm.HalfNormal("sigma")
    y = pm.Normal("y", mu=mu, sigma=sigma)

for rv, shape in pmodel.eval_rv_shapes().items():
    print(f"{rv:>11}: shape={shape}")
          x: shape=(3,)
sigma_log__: shape=()
      sigma: shape=()
          y: shape=(3,)

An even more powerful tool to understand and debug dimensionality in PyMC is the model_to_graphviz() function. Rather than inspecting array outputs we can instead read the Graphviz output to understand the dimensionality of the variables.

pm.model_to_graphviz(pmodel)
../../_images/b03cbcb6dcd2a8d7903059e9b5b7864364612416bceff654ee7f80e0618aea7c.svg

In the example above the number on the bottom left of each box (or plate) indicates the dimensionality of the distributions within. If a distribution is outside of any box with a number, it has a scalar shape.

Let’s use this tool to review implicit and explicit dimensions:

with pm.Model() as pmodel:
    pm.Normal("scalar (support)")
    pm.Normal("vector (implicit)", mu=[1, 2, 3])
    pm.Normal("vector (explicit)", shape=(4,))

pm.model_to_graphviz(pmodel)
../../_images/f991e6ba3f0d4419a21bd95431a7722b8bf4af60133f3673a0155d11bac97048.svg

Dims#

PyMC supports the concept of dims. With many random variables it can become confusing which dimensionality corresponds to which “real world” idea, e.g. number of observations, number of treated units etc. The dims argument is an additional human-readable label that can convey this meaning.

with pm.Model() as pmodel:
    pm.Normal("crayon", size=2, dims="colors")

    hyperprior = pm.Normal("hyperprior", [1, 2, 3, 4], dims="group")
    pm.Normal("prior", mu=hyperprior, dims="group")


pm.model_to_graphviz(pmodel)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In [39], line 4
      1 with pm.Model() as pmodel:
      2     pm.Normal("crayon", size=2, dims="colors")
----> 4     hyperprior = pm.Normal("hyperprior", [1, 2, 3, 4], dims="group")
      5     pm.Normal("prior", mu=hyperprior, dims="group")
      8 pm.model_to_graphviz(pmodel)

File ~/checkouts/readthedocs.org/user_builds/pymc/conda/stable/lib/python3.11/site-packages/pymc/distributions/distribution.py:286, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, *args, **kwargs)
    284 if kwargs.get("size") is None and kwargs.get("shape") is None:
    285     if dims is not None:
--> 286         kwargs["shape"] = shape_from_dims(dims, model)
    287     elif observed is not None:
    288         kwargs["shape"] = tuple(observed.shape)

File ~/checkouts/readthedocs.org/user_builds/pymc/conda/stable/lib/python3.11/site-packages/pymc/distributions/shape_utils.py:507, in shape_from_dims(dims, model)
    505 unknowndim_dims = set(dims) - set(model.dim_lengths)
    506 if unknowndim_dims:
--> 507     raise KeyError(
    508         f"Dimensions {unknowndim_dims} are unknown to the model and cannot be used to specify a `shape`."
    509     )
    511 return tuple(model.dim_lengths[dname] for dname in dims)

KeyError: "Dimensions {'group'} are unknown to the model and cannot be used to specify a `shape`."

Where dims can become increasingly powerful is with the use of coords specified in the model itself. This gives a unique label to each dim entry, rendering it much more meaningful.

with pm.Model(
    coords={
        "year": [2020, 2021, 2022],
    }
) as pmodel:

    pm.Normal("profit", dims="year")

pm.model_to_graphviz(pmodel)
../../_images/6621ea6b1873f2529c9ac9c338bdfc180473a59419e99312047a330882c1662c.svg

Note that the dimensionality of the distribution was actually defined by the dims used. We did not pass shape or define implicit batched dimensions.

Let us to review the different dimensionality flavours with a Multivariate Normal example.

with pm.Model(
    coords={
        "batch": [0, 1, 2, 3],
    }
) as pmodel:
    pm.MvNormal("vector", mu=[0, 0, 0], cov=np.eye(3), dims=("support",))
    pm.MvNormal("matrix (implicit)", mu=np.zeros((4, 3)), cov=np.eye(3), dims=("batch", "support"))
    pm.MvNormal(
        "matrix (explicit)", mu=[0, 0, 0], cov=np.eye(3), shape=(4, 3), dims=("batch", "support")
    )

pm.model_to_graphviz(pmodel)
../../_images/b7222317495f3988c2d1b90a0ade464129e1b4b914ffb4ab1cdb2f19dcfad63d.svg

Tip

For final model publication we suggest dims and coords as the labels will be passed to arviz.InferenceData. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it’ll help indiciate which dimension corresponds to which model concept.

Tips for debugging shape issues#

While we provide all these tools for convenience, and while PyMC does it best to understand user intent, the result of mixed dimensionality tools may not always result in the final dimensionality intended. Sometimes the model may not indicate an error until sampling, or not indicate an issue at all. When working with dimensionality, particular more complex ones we suggest:

  • Using model_to_graphviz to visualize your model before sampling

  • Using draw or sample_prior predictive to catch errors early

  • Inspecting the returned az.InferenceData object to ensure all array sizes are as intended

  • Defining shapes with prime numbers when tracking down errors.