pymc.total_norm_constraint#
- pymc.total_norm_constraint(tensor_vars, max_norm, epsilon=1e-07, return_norm=False)[source]#
Rescales a list of tensors based on their combined norm
If the combined norm of the input tensors exceeds the threshold then all tensors are rescaled such that the combined norm is equal to the threshold.
Scaling the norms of the gradients is often used when training recurrent neural networks [1].
- Parameters:
- tensor_vars: List of TensorVariables.
Tensors to be rescaled.
- max_norm: float
Threshold value for total norm.
- epsilon: scalar, optional
Value used to prevent numerical instability when dividing by very small or zero norms.
- return_norm: bool
If true the total norm is also returned.
- Returns:
Notes
The total norm can be used to monitor training.
References
[1]Sutskever, I., Vinyals, O., & Le, Q. V. (2014): Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104-3112).
Examples
>>> from lasagne.layers import InputLayer, DenseLayer >>> import lasagne >>> from lasagne.updates import sgd, total_norm_constraint >>> x = pt.matrix() >>> y = pt.ivector() >>> l_in = InputLayer((5, 10)) >>> l1 = DenseLayer(l_in, num_units=7, nonlinearity=pt.special.softmax) >>> output = lasagne.layers.get_output(l1, x) >>> cost = pt.mean(pt.nnet.categorical_crossentropy(output, y)) >>> all_params = lasagne.layers.get_all_params(l1) >>> all_grads = pt.grad(cost, all_params) >>> scaled_grads = total_norm_constraint(all_grads, 5) >>> updates = sgd(scaled_grads, all_params, learning_rate=0.1)