Construct a graph for the gradient with respect to each input variable.

Each returned Variable represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of type NullType for that input.

Using the reverse-mode AD characterization given in [1], for a $$C = f(A, B)$$ representing the function implemented by the Op and its two arguments $$A$$ and $$B$$, given by the Variables in inputs, the values returned by Op.grad represent the quantities $$\bar{A} \equiv \frac{\partial S_O}{A}$$ and $$\bar{B}$$, for some scalar output term $$S_O$$ of $$C$$ in

$\operatorname{Tr}\left(\bar{C}^\top dC\right) = \operatorname{Tr}\left(\bar{A}^\top dA\right) + \operatorname{Tr}\left(\bar{B}^\top dB\right)$
Parameters
inputs

The input variables.

grads