pymc.momentum#
- pymc.momentum(loss_or_grads=None, params=None, learning_rate=0.001, momentum=0.9)[source]#
Stochastic Gradient Descent (SGD) updates with momentum.
Generates update expressions of the form:
velocity := momentum * velocity - learning_rate * gradient
param := param + velocity
- Parameters:
- loss_or_grads: symbolic expression or list of expressions
A scalar loss expression, or a list of gradient expressions
- params: list of shared variables
The variables to generate update expressions for
- learning_rate: float or symbolic scalar
The learning rate controlling the size of update steps
- momentum: float or symbolic scalar, optional
The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.
- Returns:
OrderedDict
A dictionary mapping each parameter to its update expression
See also
apply_momentum
Generic function applying momentum to updates
nesterov_momentum
Nesterov’s variant of SGD with momentum
Notes
Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.
Optimizer can be called without both loss_or_grads and params in that case partial function is returned
Examples
>>> a = pytensor.shared(1.0) >>> b = a * 2 >>> updates = momentum(b, [a], learning_rate=0.01) >>> isinstance(updates, dict) True >>> optimizer = momentum(learning_rate=0.01) >>> callable(optimizer) True >>> updates = optimizer(b, [a]) >>> isinstance(updates, dict) True