pymc.rmsprop(loss_or_grads=None, params=None, learning_rate=1.0, rho=0.9, epsilon=1e-06)[source]#

RMSProp updates

Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients. See [1] for further description.

loss_or_grads: symbolic expression or list of expressions

A scalar loss expression, or a list of gradient expressions

params: list of shared variables

The variables to generate update expressions for

learning_rate: float or symbolic scalar

The learning rate controlling the size of update steps

rho: float or symbolic scalar

Gradient moving average decay factor

epsilon: float or symbolic scalar

Small value added for numerical stability


A dictionary mapping each parameter to its update expression


rho should be between 0 and 1. A value of rho close to 1 will decay the moving average slowly and a value close to 0 will decay the moving average fast.

Using the step size \(\eta\) and a decay factor \(\rho\) the learning rate \(\eta_t\) is calculated as:

\[\begin{split}r_t &= \rho r_{t-1} + (1-\rho)*g^2\\ \eta_t &= \frac{\eta}{\sqrt{r_t + \epsilon}}\end{split}\]

Optimizer can be called without both loss_or_grads and params in that case partial function is returned



Tieleman, at. and Hinton, G. (2012): Neural Networks for Machine Learning, Lecture 6.5 - rmsprop. Coursera. (formula @5:20)


>>> a = pytensor.shared(1.)
>>> b = a*2
>>> updates = rmsprop(b, [a], learning_rate=.01)
>>> isinstance(updates, dict)
>>> optimizer = rmsprop(learning_rate=.01)
>>> callable(optimizer)
>>> updates = optimizer(b, [a])
>>> isinstance(updates, dict)