pymc.Data(name, value, *, dims=None, coords=None, export_index_as_coords=False, mutable=None, **kwargs)[source]#

Data container that registers a data variable with the model.

Depending on the mutable setting (default: True), the variable is registered as a SharedVariable, enabling it to be altered in value and shape, but NOT in dimensionality using pymc.set_data().

To set the value of the data container variable, check out pymc.Model.set_data().

For more information, read the notebook Using shared variables (Data container adaptation).


The name for this variable.

valuearray_like or pandas.Series, pandas.Dataframe

A value to associate with this variable.

dimsstr or tuple of str, optional

Dimension names of the random variables (as opposed to the shapes of these random variables). Use this when value is a pandas Series or DataFrame. The dims will then be the name of the Series / DataFrame’s columns. See ArviZ documentation for more information about dimensions and coordinates: ArviZ Quickstart. If this parameter is not specified, the random variables will not have dimension names.

coordsdict, optional

Coordinate values to set for new dimensions introduced by this Data variable.

export_index_as_coordsbool, default=False

If True, the Data container will try to infer what the coordinates and dimension names should be if there is an index in value.

mutablebool, optional

Switches between creating a SharedVariable (mutable=True) vs. creating a TensorConstant (mutable=False). Consider using pymc.ConstantData or pymc.MutableData as less verbose alternatives to pm.Data(..., mutable=...). If this parameter is not specified, the value it takes will depend on the version of the package. Since v4.1.0 the default value is mutable=False, with previous versions having mutable=True.

**kwargsdict, optional

Extra arguments passed to aesara.shared().


>>> import pymc as pm
>>> import numpy as np
>>> # We generate 10 datasets
>>> true_mu = [np.random.randn() for _ in range(10)]
>>> observed_data = [mu + np.random.randn(20) for mu in true_mu]
>>> with pm.Model() as model:
...     data = pm.MutableData('data', observed_data[0])
...     mu = pm.Normal('mu', 0, 10)
...     pm.Normal('y', mu=mu, sigma=1, observed=data)
>>> # Generate one trace for each dataset
>>> idatas = []
>>> for data_vals in observed_data:
...     with model:
...         # Switch out the observed dataset
...         model.set_data('data', data_vals)
...         idatas.append(pm.sample())