pymc.adam#

pymc.adam(loss_or_grads=None, params=None, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)[source]#

Adam updates

Adam updates implemented as in [1].

Parameters:
loss_or_grads: symbolic expression or list of expressions

A scalar loss expression, or a list of gradient expressions

params: list of shared variables

The variables to generate update expressions for

learning_rate: float

Learning rate

beta1: float

Exponential decay rate for the first moment estimates.

beta2: float

Exponential decay rate for the second moment estimates.

epsilon: float

Constant for numerical stability.

Returns:
OrderedDict

A dictionary mapping each parameter to its update expression

Notes

The paper [1] includes an additional hyperparameter lambda. This is only needed to prove convergence of the algorithm and has no practical use (personal communication with the authors), it is therefore omitted here.

Optimizer can be called without both loss_or_grads and params in that case partial function is returned

References

[1] (1,2)

Kingma, Diederik, and Jimmy Ba (2014): Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.

Examples

>>> a = pytensor.shared(1.0)
>>> b = a * 2
>>> updates = adam(b, [a], learning_rate=0.01)
>>> isinstance(updates, dict)
True
>>> optimizer = adam(learning_rate=0.01)
>>> callable(optimizer)
True
>>> updates = optimizer(b, [a])
>>> isinstance(updates, dict)
True