pymc.apply_nesterov_momentum(updates, params=None, momentum=0.9)[source]#

Returns a modified update dictionary including Nesterov momentum

Generates update expressions of the form:

  • velocity := momentum * velocity + updates[param] - param

  • param := param + momentum * velocity + updates[param] - param

updates: OrderedDict

A dictionary mapping parameters to update expressions

params: iterable of shared variables, optional

The variables to apply momentum to. If omitted, will apply momentum to all updates.keys().

momentum: float or symbolic scalar, optional

The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.


A copy of updates with momentum updates for all params.

See also


Shortcut applying Nesterov momentum to SGD updates


Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.

The classic formulation of Nesterov momentum (or Nesterov accelerated gradient) requires the gradient to be evaluated at the predicted next position in parameter space. Here, we use the formulation described at lisa-lab/pylearn2#136, which allows the gradient to be evaluated at the current parameters.