- This code backpropagates a neural network loss gradient across a matrix multiplication.
- It is understood that the forward pass multiplies matrix mk against matrix kn to yield matrix mn.
- Matrices are row-major. mk has m rows, k columns. kn has k rows, n columns. mn has m rows, n columns.
- The read-only loss gradient for mn is input as dmn (which has m rows and n columns, same shape as mn).
- The output loss gradient for mk is added into dmk (which has m rows and k columns, same shape as mk).
- The output loss gradient for kn is added into dkn (which has k rows and n columns, same shape as kn).
- The addition to matrix dmk is computed by multiplying matrix dmn against the transpose of matrix kn.
- The addition to matrix dkn is computed by multiplying the transpose of matrix mk against matrix dmn.

To receive a hint, submit unfixed code.