pymc.apply_nesterov_momentum#

pymc.apply_nesterov_momentum(updates, params=None, momentum=0.9)[source]#

Returns a modified update dictionary including Nesterov momentum

Generates update expressions of the form:

  • velocity := momentum * velocity + updates[param] - param

  • param := param + momentum * velocity + updates[param] - param

Parameters
updates: OrderedDict

A dictionary mapping parameters to update expressions

params: iterable of shared variables, optional

The variables to apply momentum to. If omitted, will apply momentum to all updates.keys().

momentum: float or symbolic scalar, optional

The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.

Returns
OrderedDict

A copy of updates with momentum updates for all params.

See also

nesterov_momentum

Shortcut applying Nesterov momentum to SGD updates

Notes

Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.

The classic formulation of Nesterov momentum (or Nesterov accelerated gradient) requires the gradient to be evaluated at the predicted next position in parameter space. Here, we use the formulation described at https://github.com/lisa-lab/pylearn2/pull/136#issuecomment-10381617, which allows the gradient to be evaluated at the current parameters.