BUGFIX-66

Bug 60: Degrading The Gradient

This code backpropagates a neural network loss gradient across a matrix multiplication.
It is understood that the forward pass multiplies matrix mk against matrix kn to yield matrix mn.
Matrices are row-major. mk has m rows, k columns. kn has k rows, n columns. mn has m rows, n columns.
The read-only loss gradient for mn is input as dmn (which has m rows and n columns, same shape as mn).
The output loss gradient for mk is added into dmk (which has m rows and k columns, same shape as mk).
The output loss gradient for kn is added into dkn (which has k rows and n columns, same shape as kn).
The addition to matrix dmk is computed by multiplying matrix dmn against the transpose of matrix kn.
The addition to matrix dkn is computed by multiplying the transpose of matrix mk against matrix dmn.