BUGFIX-66

Bug 62: Hard Times In Softmax City

This code backpropagates a neural network loss gradient across a softmax.
It is understood that the forward pass computes the softmax of vector x, yielding vector y.
The read-only loss gradient for y is input as dy. The output loss gradient for x is written to dx.
x is not necessary for the gradient computation (only y and dy are needed), so x is not provided.
The change in y[i1] per unit change in x[i2] is negative y[i1] times y[i2] for i1 not equal to i2.
The change in y[i1] per unit change in x[i1] is y[i1] minus the square of y[i1].