pymc.rmsprop#
- pymc.rmsprop(loss_or_grads=None, params=None, learning_rate=1.0, rho=0.9, epsilon=1e-06)[source]#
RMSProp updates.
Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients. See [1] for further description.
- Parameters:
- loss_or_grads: symbolic expression or list of expressions
A scalar loss expression, or a list of gradient expressions
- params: list of shared variables
The variables to generate update expressions for
- learning_rate: float or symbolic scalar
The learning rate controlling the size of update steps
- rho: float or symbolic scalar
Gradient moving average decay factor
- epsilon: float or symbolic scalar
Small value added for numerical stability
- Returns:
OrderedDict
A dictionary mapping each parameter to its update expression
Notes
rho should be between 0 and 1. A value of rho close to 1 will decay the moving average slowly and a value close to 0 will decay the moving average fast.
Using the step size \(\\eta\) and a decay factor \(\\rho\) the learning rate \(\\eta_t\) is calculated as:
\[\begin{split}r_t &= \\rho r_{t-1} + (1-\\rho)*g^2\\\\ \\eta_t &= \\frac{\\eta}{\\sqrt{r_t + \\epsilon}}\end{split}\]Optimizer can be called without both loss_or_grads and params in that case partial function is returned
References
[1]Tieleman, at. and Hinton, G. (2012): Neural Networks for Machine Learning, Lecture 6.5 - rmsprop. Coursera. http://www.youtube.com/watch?v=O3sxAc4hxZU (formula @5:20)
Examples
>>> a = pytensor.shared(1.) >>> b = a*2 >>> updates = rmsprop(b, [a], learning_rate=.01) >>> isinstance(updates, dict) True >>> optimizer = rmsprop(learning_rate=.01) >>> callable(optimizer) True >>> updates = optimizer(b, [a]) >>> isinstance(updates, dict) True