pymc.adadelta#
- pymc.adadelta(loss_or_grads=None, params=None, learning_rate=1.0, rho=0.95, epsilon=1e-06)[source]#
Adadelta updates
Scale learning rates by the ratio of accumulated gradients to accumulated updates, see [1] and notes for further description.
- Parameters:
- loss_or_grads
symbolic
expression
orlist
ofexpressions
A scalar loss expression, or a list of gradient expressions
- params
list
ofshared
variables
The variables to generate update expressions for
- learning_rate
float
orsymbolic
scalar The learning rate controlling the size of update steps
- rho
float
orsymbolic
scalar Squared gradient moving average decay factor
- epsilon
float
orsymbolic
scalar Small value added for numerical stability
- loss_or_grads
- Returns:
OrderedDict
A dictionary mapping each parameter to its update expression
Notes
rho should be between 0 and 1. A value of rho close to 1 will decay the moving average slowly and a value close to 0 will decay the moving average fast.
rho = 0.95 and epsilon=1e-6 are suggested in the paper and reported to work for multiple datasets (MNIST, speech).
In the paper, no learning rate is considered (so learning_rate=1.0). Probably best to keep it at this value. epsilon is important for the very first update (so the numerator does not become 0).
Using the step size eta and a decay factor rho the learning rate is calculated as:
\[\begin{split}r_t &= \\rho r_{t-1} + (1-\\rho)*g^2\\\\ \\eta_t &= \\eta \\frac{\\sqrt{s_{t-1} + \\epsilon}} {\sqrt{r_t + \epsilon}}\\\\ s_t &= \\rho s_{t-1} + (1-\\rho)*(\\eta_t*g)^2\end{split}\]Optimizer can be called without both loss_or_grads and params in that case partial function is returned
References
[1]Zeiler, M. D. (2012): ADADELTA: An Adaptive Learning Rate Method. arXiv Preprint arXiv:1212.5701.
Examples
>>> a = pytensor.shared(1.) >>> b = a*2 >>> updates = adadelta(b, [a], learning_rate=.01) >>> isinstance(updates, dict) True >>> optimizer = adadelta(learning_rate=.01) >>> callable(optimizer) True >>> updates = optimizer(b, [a]) >>> isinstance(updates, dict) True