pymc.total_norm_constraint(tensor_vars, max_norm, epsilon=1e-07, return_norm=False)[source]#

Rescales a list of tensors based on their combined norm

If the combined norm of the input tensors exceeds the threshold then all tensors are rescaled such that the combined norm is equal to the threshold.

Scaling the norms of the gradients is often used when training recurrent neural networks [1].

tensor_vars: List of TensorVariables.

Tensors to be rescaled.

max_norm: float

Threshold value for total norm.

epsilon: scalar, optional

Value used to prevent numerical instability when dividing by very small or zero norms.

return_norm: bool

If true the total norm is also returned.

tensor_vars_scaled: list of TensorVariables

The scaled tensor variables.

norm: Aesara scalar

The combined norms of the input variables prior to rescaling, only returned if return_norms=True.


The total norm can be used to monitor training.



Sutskever, I., Vinyals, O., & Le, Q. V. (2014): Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104-3112).


>>> from lasagne.layers import InputLayer, DenseLayer
>>> import lasagne
>>> from lasagne.updates import sgd, total_norm_constraint
>>> x = at.matrix()
>>> y = at.ivector()
>>> l_in = InputLayer((5, 10))
>>> l1 = DenseLayer(l_in, num_units=7, nonlinearity=at.nnet.softmax)
>>> output = lasagne.layers.get_output(l1, x)
>>> cost = at.mean(at.nnet.categorical_crossentropy(output, y))
>>> all_params = lasagne.layers.get_all_params(l1)
>>> all_grads = at.grad(cost, all_params)
>>> scaled_grads = total_norm_constraint(all_grads, 5)
>>> updates = sgd(scaled_grads, all_params, learning_rate=0.1)