Bug 62: Hard Times In Softmax City
- This code backpropagates a neural network loss gradient across a softmax.
- It is understood that the forward pass computes the softmax of vector x, yielding vector y.
- The read-only loss gradient for y is input as dy. The output loss gradient for x is written to dx.
- x is not necessary for the gradient computation (only y and dy are needed), so x is not provided.
- The change in y[i1] per unit change in x[i2] is negative y[i1] times y[i2] for i1 not equal to i2.
- The change in y[i1] per unit change in x[i1] is y[i1] minus the square of y[i1].
Fix The Tiny Bug In This Go Code: