pymc.adam#
- pymc.adam(loss_or_grads=None, params=None, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)[source]#
Adam updates
Adam updates implemented as in [1].
- Parameters:
- loss_or_grads: symbolic expression or list of expressions
A scalar loss expression, or a list of gradient expressions
- params: list of shared variables
The variables to generate update expressions for
- learning_rate: float
Learning rate
- beta1: float
Exponential decay rate for the first moment estimates.
- beta2: float
Exponential decay rate for the second moment estimates.
- epsilon: float
Constant for numerical stability.
- Returns:
OrderedDict
A dictionary mapping each parameter to its update expression
Notes
The paper [1] includes an additional hyperparameter lambda. This is only needed to prove convergence of the algorithm and has no practical use (personal communication with the authors), it is therefore omitted here.
Optimizer can be called without both loss_or_grads and params in that case partial function is returned
References
Examples
>>> a = pytensor.shared(1.0) >>> b = a * 2 >>> updates = adam(b, [a], learning_rate=0.01) >>> isinstance(updates, dict) True >>> optimizer = adam(learning_rate=0.01) >>> callable(optimizer) True >>> updates = optimizer(b, [a]) >>> isinstance(updates, dict) True