pymc.Data(name, value, *, dims=None, coords=None, infer_dims_and_coords=False, mutable=None, **kwargs)[source]#

Data container that registers a data variable with the model.

Depending on the mutable setting (default: True), the variable is registered as a SharedVariable, enabling it to be altered in value and shape, but NOT in dimensionality using pymc.set_data().

To set the value of the data container variable, check out pymc.Model.set_data().

When making predictions or doing posterior predictive sampling, the shape of the registered data variable will most likely need to be changed. If you encounter an PyTensor shape mismatch error, refer to the documentation for pymc.model.set_data().

For more information, read the notebook Using Data Containers.


The name for this variable.

valuearray_like or pandas.Series, pandas.Dataframe

A value to associate with this variable.

dimsstr or tuple of str, optional

Dimension names of the random variables (as opposed to the shapes of these random variables). Use this when value is a pandas Series or DataFrame. The dims will then be the name of the Series / DataFrame’s columns. See ArviZ documentation for more information about dimensions and coordinates: ArviZ Quickstart. If this parameter is not specified, the random variables will not have dimension names.

coordsdict, optional

Coordinate values to set for new dimensions introduced by this Data variable.


Deprecated, previous version of “infer_dims_and_coords”

infer_dims_and_coordsbool, default=False

If True, the Data container will try to infer what the coordinates and dimension names should be if there is an index in value.

**kwargsdict, optional

Extra arguments passed to pytensor.shared().


>>> import pymc as pm
>>> import numpy as np
>>> # We generate 10 datasets
>>> true_mu = [np.random.randn() for _ in range(10)]
>>> observed_data = [mu + np.random.randn(20) for mu in true_mu]
>>> with pm.Model() as model:
...     data = pm.Data('data', observed_data[0])
...     mu = pm.Normal('mu', 0, 10)
...     pm.Normal('y', mu=mu, sigma=1, observed=data)
>>> # Generate one trace for each dataset
>>> idatas = []
>>> for data_vals in observed_data:
...     with model:
...         # Switch out the observed dataset
...         model.set_data('data', data_vals)
...         idatas.append(pm.sample())