pymc.nesterov_momentum#
- pymc.nesterov_momentum(loss_or_grads=None, params=None, learning_rate=0.001, momentum=0.9)[source]#
Stochastic Gradient Descent (SGD) updates with Nesterov momentum.
Generates update expressions of the form:
velocity := momentum * velocity - learning_rate * gradient
param := param + momentum * velocity - learning_rate * gradient
- Parameters:
- loss_or_grads: symbolic expression or list of expressions
A scalar loss expression, or a list of gradient expressions
- params: list of shared variables
The variables to generate update expressions for
- learning_rate: float or symbolic scalar
The learning rate controlling the size of update steps
- momentum: float or symbolic scalar, optional
The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.
- Returns:
OrderedDict
A dictionary mapping each parameter to its update expression
See also
apply_nesterov_momentum
Function applying momentum to updates
Notes
Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.
The classic formulation of Nesterov momentum (or Nesterov accelerated gradient) requires the gradient to be evaluated at the predicted next position in parameter space. Here, we use the formulation described at lisa-lab/pylearn2#136, which allows the gradient to be evaluated at the current parameters.
Optimizer can be called without both loss_or_grads and params in that case partial function is returned
Examples
>>> a = pytensor.shared(1.0) >>> b = a * 2 >>> updates = nesterov_momentum(b, [a], learning_rate=0.01) >>> isinstance(updates, dict) True >>> optimizer = nesterov_momentum(learning_rate=0.01) >>> callable(optimizer) True >>> updates = optimizer(b, [a]) >>> isinstance(updates, dict) True