pymc.backends.zarr.ZarrTrace#
- class pymc.backends.zarr.ZarrTrace(store=None, synchronizer=None, compressor=UNSET, draws_per_chunk=1, include_transformed=False)[source]#
Object that stores and enables access to MCMC draws stored in a
zarr.hierarchy.Group
objects.This class creats a zarr hierarchy to represent the sampling information which is intended to mimic
arviz.InferenceData
. The hierarchy looks like this:root|–> constant_data|–> observed_data|–> posterior|–> unconstrained_posterior|–> sample_stats|–> warmup_posterior|–> warmup_unconstrained_posterior|–> warmup_sample_stats|–> _sampling_stateThe root group is created when the
ZarrTrace
object is initialized. The rest of the groups are created onceinit_trace()
is called with a few exceptions: unconstrained_posterior is only created ifinclude_transformed = True
, and the groups prefixed withwarmup_
are created only after callingsplit_warmup_groups()
.Since
ZarrTrace
objects are intended to be as close toarviz.InferenceData
objects as possible, the groups store the dimension and coordinate information following the xarray zarr standard.- Parameters:
- store
zarr.storage.BaseStore
|collections.abc.MutableMapping
|None
The store object where the zarr groups and arrays will be stored and read from. Any zarr compatible storage object works. Keep in mind that if
None
is provided, azarr.storage.MemoryStore
will be used, which means that information won’t be visible to other processes and won’t persist after theZarrTrace
life-cycle ends. If you want to have persistent storage, please use one of the multiple disk backed zarr storage options, e.g.DirectoryStore
orZipStore
.- synchronizer
zarr.sync.Synchronizer
|None
The synchronizer to use for the underlying zarr arrays.
- compressor
numcodec.abc.Codec
|None
|pymc.util.UNSET
The compressor to use for the underlying zarr arrays. If
None
, no compressor is used. IfUNSET
, zarr’s default compressor is used.- draws_per_chunk
int
The number of draws that make up a chunk in the variable’s posterior array. Each variable’s array shape is set to
(n_chains, n_draws, *rv_shape)
, but the chunks are set to(1, draws_per_chunk, *rv_shape)
. This means that each chain will have it’s own chunk to read or write to, allowing for concurrent write operations of different chains not to interfere with each other, and that multiple draws can belong to the same chunk. The variable’s core dimension however, will never be split across different chunks.- include_transformedbool
If
True
, the transformed, unconstrained value variables are included in the storage group.
- store
See also
Notes
ZarrTrace
objects represent the storage information. If the underlying store persists on disk or over the network (e.g. with azarr.storage.FSStore
) multiple process will be able to concurrently access the same storage and read or write to it.The intended division of labour is for
ZarrTrace
to handle the creation and management of the zarr group and storage objects and arrays, and for individualZarrChain
objects to handle recording MCMC samples to the trace. This division was chosen to stay close to the existing pymc.backends.base.MultiTrace and pymc.backends.ndarray.NDArray way of working with the existing samplers.One extra feature of
ZarrTrace
is that it enables direct access to any array’s metadata.ZarrTrace
takes advantage of this to tag arrays asdeterministic
orfreeRV
depending on what kind of variable they were in the defining model.Methods
ZarrTrace.__init__
([store, synchronizer, ...])ZarrTrace.create_group
(name, data_dict)ZarrTrace.init_group_with_empty
(group, ...)ZarrTrace.init_sampling_state_group
(tune, chains)ZarrTrace.init_trace
(chains, draws, tune, step)Initialize the trace groups and arrays.
ZarrTrace.split_warmup
(group_name[, ...])Split the arrays of a group into the warmup and regular groups.
Split the warmup and standard groups.
ZarrTrace.to_inferencedata
([save_warmup])Convert
ZarrTrace
toInferenceData
.Attributes
constant_data
observed_data
posterior
sample_stats
sampling_time
tuning_steps
unconstrained_posterior