# Distribution Dimensionality#

PyMC provides a number of ways to specify the dimensionality of its distributions. This document provides an overview, and offers some user tips.

## Glossary#

In this document we’ll be using the term dimensionality to refer to the idea of dimensions. Each of the terms below has a specific semantic and computational definition in PyMC. While we share them here they will make much more sense when viewed in the examples below.

*Support dimensions*→ The core dimensionality of a distribution*Batch dimensions*→ Extra dimensions beyond the support dimensionality of a distribution*Implicit dimensions*→ Dimensions that follow from the values or shapes of the distribution parameters*Explicit dimensions*→ Dimensions that are explicitly defined by one of the following arguments:*Shape*→ Number of draws from a distribution*Dims*→ An array of dimension names

*Coords*→ A dictionary mapping dimension names to coordinate values

```
from functools import partial
import pymc as pm
import numpy as np
import aesara.tensor as at
```

## Univariate distribution example#

We can start with the simplest case, a single Normal distribution. We use `.dist`

to specify one outside of a PyMC Model.

```
normal_dist = pm.Normal.dist()
```

We can then use the `draw()`

function to take a random draw from that distribution.

```
# Just patching the draw function for reproducibility
rng = np.random.default_rng(seed=sum(map(ord, "dimensionality")))
draw = partial(pm.draw, random_seed=rng)
```

```
normal_draw = draw(normal_dist)
normal_draw, normal_draw.ndim
```

```
(array(0.80189558), 0)
```

In this case we end up with a single scalar value. This means that a Normal distribution has a scalar support dimensionality, as the smallest random draw you can take is a scalar which has a dimension of zero. The support dimensionality of every distribution is hard-coded as a property.

```
normal_dist.owner.op.ndim_supp
```

```
0
```

### Explicit batch dimensions#

If one needs more than a single draw, a natural tendency would be to create multiple copies of the same variable and stack them together.

```
normal_dists = pm.math.stack([pm.Normal.dist() for _ in range(3)])
draw(normal_dists)
```

```
array([ 0.9434115 , -0.33327414, 0.83636296])
```

More simply, one can create a *batch* of independent draws from the same distribution family by using the shape argument.

```
normal_dists = pm.Normal.dist(shape=(3,))
draw(normal_dists)
```

```
array([ 0.98810294, -0.07003785, -0.37962748])
```

```
normal_dists = pm.Normal.dist(shape=(4, 3))
draw(normal_dists)
```

```
array([[ 7.99932116e-04, -1.94407945e+00, 3.90451962e-01],
[ 1.10657367e+00, 6.49042149e-01, -1.09450185e+00],
[-2.96226305e-01, 1.41884595e+00, -1.31589441e+00],
[ 1.53109449e+00, -7.73771737e-01, 2.37418367e+00]])
```

Not only is this more succint, but it produces much more efficient vectorized code. We rarely use the stack approach in PyMC, unless we need to combine draws from distinct distribution families.

### Implicit batch dimensions#

It is also possible to create a batch of draws by passing parameters with higher dimensions, without having to specify shape.

```
normal_dists = pm.Normal.dist(mu=np.array([0, 0, 0]), sigma=np.array([1, 1, 1]))
draw(normal_dists)
```

```
array([ 0.81833093, -0.2891973 , 1.2399946 ])
```

This is equivalent to the previous example with explicit shape, and we could have passed it explicitly here. Because we did not, we refer to these batch dimensions as being *implicit*.

Where this becomes very useful is when we want the parameters to vary across batch dimensions.

```
draw(pm.Normal.dist(mu=[1, 10, 100], sigma=0.0001))
```

```
array([ 0.99989975, 10.00009874, 100.00004215])
```

When the parameters don’t have the same shapes, they are broacasted, in a similar way to how NumPy works. In this case `sigma`

was broadcasted to match the shape of `mu`

.

```
np.broadcast_arrays([1, 10, 100], 0.0001)
```

```
[array([ 1, 10, 100]), array([0.0001, 0.0001, 0.0001])]
```

It’s important to understand how NumPy broadcasting works. When you do something that is not valid, you will easily encounter this sort of errors:

```
try:
# shapes of (3,) and (2,) can't be broadcasted together
draw(pm.Normal.dist(mu=[1, 10, 100], sigma=[0.1, 0.1]))
except ValueError as error:
print(error)
```

```
shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (3,) and arg 1 with shape (2,).
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F9CED77D540>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{[ 1 10 100]}, TensorConstant{(2,) of 0.1})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(float64, (2,))]
Inputs shapes: ['No shapes', (0,), (), (3,), (2,)]
Inputs strides: ['No strides', (0,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7F9CED77D540, array([], dtype=int64), array(11), array([ 1, 10, 100]), array([0.1, 0.1])]
Outputs clients: [['output'], ['output']]
HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
```

### Combining implicit and explicit batch dimensions#

You can combine explicit shape dimensions with implicit batch dimensions. As mentioned above, they can provide the same information.

```
normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3,))
draw(normal_dists)
```

```
array([-0.49526775, -0.94608062, 1.66397913])
```

But shape can also be used to extend beyond any implicit batch dimensions.

```
normal_dists = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3))
draw(normal_dists)
```

```
array([[ 2.22626513, 2.12938134, 0.49074886],
[ 0.08312601, 1.05049093, 1.91718083],
[-0.68191815, 1.43771096, 1.76780399],
[-0.59883241, 0.26954893, 2.74319335]])
```

Note that, due to broadcasting rules, explicit batch dimensions must always “go on the left” of any implicit dimensions. So in the previous example `shape=(4, 3)`

is valid, but `shape=(3, 4)`

is not, because the `mu`

parameter can be broadcasted to the first shape but not to the second.

```
try:
draw(pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(3, 4)))
except ValueError as error:
print(error)
```

```
shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (3, 4) and arg 1 with shape (3,).
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F9CECE417E0>), TensorConstant{[3 4]}, TensorConstant{11}, TensorConstant{[0 1 2]}, TensorConstant{1.0})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (2,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(float64, ())]
Inputs shapes: ['No shapes', (2,), (), (3,), ()]
Inputs strides: ['No strides', (8,), (), (8,), ()]
Inputs values: [Generator(PCG64) at 0x7F9CECE417E0, array([3, 4]), array(11), array([0, 1, 2]), array(1.)]
Outputs clients: [['output'], ['output']]
HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
```

If you needed the Normal variables to have `shape=(4, 3)`

, you can transpose it after defining it.

```
transposed_normals = pm.Normal.dist(mu=np.array([0, 1, 2]), sigma=1, shape=(4, 3)).T
draw(transposed_normals)
```

```
array([[-0.73397401, -0.18717845, -0.78548049, 1.64478883],
[ 3.54543846, 1.22954216, 2.13674063, 1.94194106],
[ 0.85294471, 3.52041332, 2.94428975, 3.25944187]])
```

Tip

It’s important not to confuse dimensions set in the definition of a distribution versus those set in downstream manipulations like transposition, indexing or broadcasting. When sampling with PyMC (be it via forward sampling or MCMC), the random draws will always emanate from the distribution shape. Notice how in the following example, a different number of “random” draws were actually taken, despite the two variables having the same final shape.

```
vector_normals = pm.Normal.dist(size=(3,))
broadcasted_normal = at.broadcast_to(pm.Normal.dist(), (3,))
```

```
draw(vector_normals), draw(broadcasted_normal)
```

```
(array([-0.45755879, 1.59975702, 0.20546749]),
array([0.29866199, 0.29866199, 0.29866199]))
```

## Multivariate distribution example#

Some distributions by definition return more than one value when evaluated. This may be a vector of values or a matrix or an arbitrary multidimensional tensor. An example is the Multivariate Normal, which always returns a vector (an array with one dimension).

```
mvnormal_dist = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3))
mvnormal_draw = draw(mvnormal_dist)
mvnormal_draw, mvnormal_draw.ndim
```

```
(array([0.55390975, 2.17440418, 1.83014764]), 1)
```

As with any distribution, the support dimensionality is specified as a fixed property

```
mvnormal_dist.owner.op.ndim_supp
```

```
1
```

Even if you specify a MvNormal with a single dimension, you get back a vector!

```
smallest_mvnormal_dist = pm.MvNormal.dist(mu=[1], cov=[[1]])
smallest_mvnormal_draw = draw(smallest_mvnormal_dist)
smallest_mvnormal_draw, smallest_mvnormal_draw.ndim
```

```
(array([-0.68893796]), 1)
```

### Implicit support dimensions#

In the MvNormal examples we just saw, the support dimension was actually implicit. Nowhere did we specify we wanted a vector of 3 or 1 draws. This was inferred from the shape of `mu`

and `cov`

. As such, we refer to it as being an *implicit support dimension*. We could be a bit more explicit by using shape.

```
explicit_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(3,))
draw(explicit_mvnormal)
```

```
array([0.57262853, 0.34230354, 1.96818163])
```

Warning

However, note that at the time of writing shape is simply ignored for support dimensions. It serves merely as a “type-hint” for labeling the expected dimensions.

```
ignored_shape_mvnormal = pm.MvNormal.dist(mu=np.ones(3), cov=np.eye(3), shape=(4,))
draw(ignored_shape_mvnormal)
```

```
array([1.0623799 , 0.84622693, 0.34046237])
```

### Explicit batch dimensions#

As with univariate distributions, we can add explicit batched dimensions. We will use another vector distribution to illustrate this: the Multinomial. The following snippet defines a matrix of five independent Multinomial distributions, each of which is a vector of size 3.

```
draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)))
```

```
array([[2, 0, 3],
[1, 1, 3],
[0, 2, 3],
[0, 1, 4],
[1, 0, 4]])
```

Warning

Again, note that shape has no effect on the support dimensionality

```
draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 4)))
```

```
array([[0, 1, 4],
[0, 0, 5],
[3, 1, 1],
[0, 1, 4],
[0, 2, 3]])
```

For the same reason, you must always define explicit batched dimensions “to the left” of the support dimension. The following will not behave as expected.

```
draw(pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(3, 5)))
```

```
array([[2, 0, 3],
[1, 3, 1],
[1, 1, 3]])
```

If you needed the Multinomial variables to have `shape=(3, 5)`

you can transpose it after defining it.

```
transposed_multinomials = pm.Multinomial.dist(n=5, p=[0.1, 0.3, 0.6], shape=(5, 3)).T
draw(transposed_multinomials)
```

```
array([[0, 0, 0, 0, 0],
[2, 2, 1, 0, 3],
[3, 3, 4, 5, 2]])
```

### Implicit batch dimensions#

As with univariate distributions, we can use different parameters for each batched dimension

```
multinomial_dist = pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6])
draw(multinomial_dist)
```

```
array([[1, 2, 2],
[0, 3, 7]])
```

Which is equivalent to the more verbose

```
draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))
```

```
array([[2, 2, 1],
[0, 3, 7]])
```

If you are familiar with NumPy broadcasting rules you may be curious of how does PyMC make this work. Naive broadcasting wouldn’t work here

```
try:
np.broadcast_arrays([5, 10], [0.1, 0.3, 0.6])
except ValueError as exc:
print(exc)
```

```
shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (2,) and arg 1 with shape (3,).
```

To understand what is going on, we need to introduce the concept of parameter core dimensions. The core dimensions of a distribution’s parameter are the minimum number of dimensions the parameters need to have in order to define a distribution. In the Multinomial distribution, `n`

must at least be an scalar integer, but `p`

must be at least a vector that represents the probability of having an outcome on each category. So, for the Multinomial distribution, `n`

has 0 core dimensions, and `p`

has 1 core dimension.

So if we have a vector of two `n`

, we should actually broadcast the vector of `p`

into a matrix with two such vectors, and pair each `n`

with each broadcasted row of `p`

. This works exactly like `np.vectorize`

.

```
def core_multinomial(n, p):
print(">>", n, p)
return draw(pm.Multinomial.dist(n, p))
vectorized_multinomial = np.vectorize(core_multinomial, signature="(),(p)->(p)")
vectorized_multinomial([5, 10], [0.1, 0.3, 0.6])
```

```
>> 5 [0.1 0.3 0.6]
>> 10 [0.1 0.3 0.6]
```

```
array([[1, 0, 4],
[1, 2, 7]])
```

The core dimensionality of each distribution parameter is also hard-coded as a property of each distribution

```
multinomial_dist.owner.op.ndims_params
```

```
(0, 1)
```

Implicit batch dimensions must still respect broadcasting rules. The following example is not valid because `n`

has batched dimensions of `shape=(2,)`

and `p`

has batched dimensions of `shape=(3,)`

which cannot be broadcasted together.

```
try:
draw(pm.Multinomial.dist(n=[5, 10], p=[[0.1, 0.3, 0.6], [0.1, 0.3, 0.6], [0.1, 0.3, 0.6]]))
except ValueError as error:
print(error)
```

```
operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (3,)
Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F9CEC986EA0>), TensorConstant{[]}, TensorConstant{4}, TensorConstant{[ 5 10]}, TensorConstant{[[0.1 0.3 .. 0.3 0.6]]})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (2,)), TensorType(float64, (3, 3))]
Inputs shapes: ['No shapes', (0,), (), (2,), (3, 3)]
Inputs strides: ['No strides', (0,), (), (8,), (24, 8)]
Inputs values: [Generator(PCG64) at 0x7F9CEC986EA0, array([], dtype=int64), array(4), array([ 5, 10]), 'not shown']
Outputs clients: [['output'], ['output']]
HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
```

### Combining implicit and explicit batch dimensions#

You can and should combine implicit dimensions from multidimensional parameters with explicit shape information, which is easier to reason about.

```
draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 3)))
```

```
array([[0, 1, 4],
[4, 1, 5]])
```

Explicit batch dimensions can still extend beyond any implicit batch dimensions. Again, due to how broadcasting works, explicit batch dimensions must always “go on the left”. The following case is invalid, because `n`

has batched dimensions of `shape=(2,)`

, which cannot be broadcasted to the explicit batch dimensions of `shape=(2, 4)`

.

```
try:
draw(pm.Multinomial.dist(n=[5, 10], p=[0.1, 0.3, 0.6], shape=(2, 4, 3)))
except ValueError as error:
print(error)
```

```
operands could not be broadcast together with remapped shapes [original->remapped]: (2,) and requested shape (2,4)
Apply node that caused the error: multinomial_rv{1, (0, 1), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F9CEC985E00>), TensorConstant{[2 4]}, TensorConstant{4}, TensorConstant{[ 5 10]}, TensorConstant{[0.1 0.3 0.6]})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (2,)), TensorType(int64, ()), TensorType(int64, (2,)), TensorType(float64, (3,))]
Inputs shapes: ['No shapes', (2,), (), (2,), (3,)]
Inputs strides: ['No strides', (8,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7F9CEC985E00, array([2, 4]), array(4), array([ 5, 10]), array([0.1, 0.3, 0.6])]
Outputs clients: [['output'], ['output']]
HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
```

## Inspecting dimensionality with a model graph#

More often than not distributions are used inside a PyMC model, and as such there are tools that facilitate reasoning about distributions shapes in that context.

```
with pm.Model() as pmodel:
mu = pm.Normal("x", mu=0, size=(3))
sigma = pm.HalfNormal("sigma")
y = pm.Normal("y", mu=mu, sigma=sigma)
for rv, shape in pmodel.eval_rv_shapes().items():
print(f"{rv:>11}: shape={shape}")
```

```
x: shape=(3,)
sigma_log__: shape=()
sigma: shape=()
y: shape=(3,)
```

An even more powerful tool to understand and debug dimensionality in PyMC is the `model_to_graphviz()`

function. Rather than inspecting array outputs we can instead read the Graphviz output to understand the dimensionality of the variables.

```
pm.model_to_graphviz(pmodel)
```

In the example above the number on the bottom left of each box (or plate) indicates the dimensionality of the distributions within. If a distribution is outside of any box with a number, it has a scalar shape.

Let’s use this tool to review implicit and explicit dimensions:

```
with pm.Model() as pmodel:
pm.Normal("scalar (support)")
pm.Normal("vector (implicit)", mu=[1, 2, 3])
pm.Normal("vector (explicit)", shape=(4,))
pm.model_to_graphviz(pmodel)
```

## Dims#

PyMC supports the concept of `dims`

. With many random variables it can become confusing which dimensionality corresponds to which “real world” idea, e.g. number of observations, number of treated units etc. The `dims`

argument is an additional human-readable label that can convey this meaning.

```
with pm.Model() as pmodel:
pm.Normal("crayon", size=2, dims="colors")
hyperprior = pm.Normal("hyperprior", [1, 2, 3, 4], dims="group")
pm.Normal("prior", mu=hyperprior, dims="group")
pm.model_to_graphviz(pmodel)
```

```
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In [39], line 4
1 with pm.Model() as pmodel:
2 pm.Normal("crayon", size=2, dims="colors")
----> 4 hyperprior = pm.Normal("hyperprior", [1, 2, 3, 4], dims="group")
5 pm.Normal("prior", mu=hyperprior, dims="group")
8 pm.model_to_graphviz(pmodel)
File ~/checkouts/readthedocs.org/user_builds/pymc/conda/latest/lib/python3.11/site-packages/pymc/distributions/distribution.py:299, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, *args, **kwargs)
297 if kwargs.get("size") is None and kwargs.get("shape") is None:
298 if dims is not None:
--> 299 kwargs["shape"] = shape_from_dims(dims, model)
300 elif observed is not None:
301 kwargs["shape"] = tuple(observed.shape)
File ~/checkouts/readthedocs.org/user_builds/pymc/conda/latest/lib/python3.11/site-packages/pymc/distributions/shape_utils.py:507, in shape_from_dims(dims, model)
505 unknowndim_dims = set(dims) - set(model.dim_lengths)
506 if unknowndim_dims:
--> 507 raise KeyError(
508 f"Dimensions {unknowndim_dims} are unknown to the model and cannot be used to specify a `shape`."
509 )
511 return tuple(model.dim_lengths[dname] for dname in dims)
KeyError: "Dimensions {'group'} are unknown to the model and cannot be used to specify a `shape`."
```

Where `dims`

can become increasingly powerful is with the use of `coords`

specified in the model itself. This gives a unique label to each `dim`

entry, rendering it much more meaningful.

```
with pm.Model(
coords={
"year": [2020, 2021, 2022],
}
) as pmodel:
pm.Normal("profit", dims="year")
pm.model_to_graphviz(pmodel)
```

Note that the dimensionality of the distribution was actually defined by the `dims`

used. We did not pass shape or define implicit batched dimensions.

Let us to review the different dimensionality flavours with a Multivariate Normal example.

```
with pm.Model(
coords={
"batch": [0, 1, 2, 3],
}
) as pmodel:
pm.MvNormal("vector", mu=[0, 0, 0], cov=np.eye(3), dims=("support",))
pm.MvNormal("matrix (implicit)", mu=np.zeros((4, 3)), cov=np.eye(3), dims=("batch", "support"))
pm.MvNormal(
"matrix (explicit)", mu=[0, 0, 0], cov=np.eye(3), shape=(4, 3), dims=("batch", "support")
)
pm.model_to_graphviz(pmodel)
```

Tip

For final model publication we suggest dims and coords as the labels will be passed to `arviz.InferenceData`

. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher distribution it’ll help indiciate which dimension corresponds to which model concept.

## Tips for debugging shape issues#

While we provide all these tools for convenience, and while PyMC does it best to understand user intent, the result of mixed dimensionality tools may not always result in the final dimensionality intended. Sometimes the model may not indicate an error until sampling, or not indicate an issue at all. When working with dimensionality, particular more complex ones we suggest:

Using

`model_to_graphviz`

to visualize your model before samplingUsing

`draw`

or`sample_prior predictive`

to catch errors earlyInspecting the returned

`az.InferenceData`

object to ensure all array sizes are as intendedDefining shapes with prime numbers when tracking down errors.