pymc.apply_nesterov_momentum#
- pymc.apply_nesterov_momentum(updates, params=None, momentum=0.9)[source]#
Return a modified update dictionary including Nesterov momentum.
Generate update expressions of the form:
velocity := momentum * velocity + updates[param] - param
param := param + momentum * velocity + updates[param] - param
- Parameters:
- updates: OrderedDict
A dictionary mapping parameters to update expressions
- params: iterable of shared variables, optional
The variables to apply momentum to. If omitted, will apply momentum to all updates.keys().
- momentum: float or symbolic scalar, optional
The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.
- Returns:
OrderedDict
A copy of updates with momentum updates for all params.
See also
nesterov_momentum
Shortcut applying Nesterov momentum to SGD updates
Notes
Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.
The classic formulation of Nesterov momentum (or Nesterov accelerated gradient) requires the gradient to be evaluated at the predicted next position in parameter space. Here, we use the formulation described at lisa-lab/pylearn2#136, which allows the gradient to be evaluated at the current parameters.