pymc.adam#

pymc.adam(loss_or_grads=None, params=None, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)[source]#

Adam updates.

Adam updates implemented as in [1].

Parameters:

loss_or_grads: symbolic expression or list of expressions: A scalar loss expression, or a list of gradient expressions
params: list of shared variables: The variables to generate update expressions for
learning_rate: float: Learning rate
beta1: float: Exponential decay rate for the first moment estimates.
beta2: float: Exponential decay rate for the second moment estimates.
epsilon: float: Constant for numerical stability.

Returns:

OrderedDict: A dictionary mapping each parameter to its update expression

Notes

The paper [1] includes an additional hyperparameter lambda. This is only needed to prove convergence of the algorithm and has no practical use (personal communication with the authors), it is therefore omitted here.

Optimizer can be called without both loss_or_grads and params in that case partial function is returned

References

[1] (1,2)

Kingma, Diederik, and Jimmy Ba (2014): Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.

Examples

>>> a = pytensor.shared(1.0)
>>> b = a * 2
>>> updates = adam(b, [a], learning_rate=0.01)
>>> isinstance(updates, dict)
True
>>> optimizer = adam(learning_rate=0.01)
>>> callable(optimizer)
True
>>> updates = optimizer(b, [a])
>>> isinstance(updates, dict)
True