pymc.nesterov_momentum#

pymc.nesterov_momentum(loss_or_grads=None, params=None, learning_rate=0.001, momentum=0.9)[source]#

Stochastic Gradient Descent (SGD) updates with Nesterov momentum

Generates update expressions of the form:

velocity := momentum * velocity - learning_rate * gradient
param := param + momentum * velocity - learning_rate * gradient

Parameters:

loss_or_grads: symbolic expression or list of expressions: A scalar loss expression, or a list of gradient expressions
params: list of shared variables: The variables to generate update expressions for
learning_rate: float or symbolic scalar: The learning rate controlling the size of update steps
momentum: float or symbolic scalar, optional: The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.

Returns:

OrderedDict: A dictionary mapping each parameter to its update expression

See also

apply_nesterov_momentum: Function applying momentum to updates

Notes

Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.

The classic formulation of Nesterov momentum (or Nesterov accelerated gradient) requires the gradient to be evaluated at the predicted next position in parameter space. Here, we use the formulation described at lisa-lab/pylearn2#136, which allows the gradient to be evaluated at the current parameters.

Optimizer can be called without both loss_or_grads and params in that case partial function is returned

Examples

>>> a = pytensor.shared(1.)
>>> b = a*2
>>> updates = nesterov_momentum(b, [a], learning_rate=.01)
>>> isinstance(updates, dict)
True
>>> optimizer = nesterov_momentum(learning_rate=.01)
>>> callable(optimizer)
True
>>> updates = optimizer(b, [a])
>>> isinstance(updates, dict)
True