BUGFIX-66

Bug 64: An Apple In The Garden Of Eden

This is the AdamW variation on the Adam algorithm for stochastic gradient descent (Kingma & Ba, 2014).
AdamW is described in the paper Decoupled Weight Decay Regularization (Loshchilov & Hutter, 2017).
The uppercase (exported) fields of the struct should be assigned prior to the first method call.
The Before method should be called once prior to each end-to-end neural network backward pass.
The Update method should be called once for each slice of weights during each backward pass.

Fix The Tiny Bug In This Go Code:

type AdamW struct { LearningRate float32 MeanFactor float32 VarianceFactor float32 Epsilon float32 WeightDecay float32 Schedule func(int) float32 tick int meanPower float32 meanRatio float32 variancePower float32 varianceRecip float32 weightDecay float32 } func (aw *AdamW) Before() { if aw.tick++; aw.tick == 1 { aw.meanPower = 1 aw.variancePower = 1 } sched := aw.Schedule(aw.tick) aw.meanPower *= aw.MeanFactor aw.meanRatio = sched * aw.LearningRate / (1 - aw.meanPower) aw.variancePower *= aw.VarianceFactor aw.varianceRecip = 1 / (1 - aw.variancePower) aw.weightDecay = sched * aw.WeightDecay } func (aw *AdamW) Update(weight, gradient, mean, variance []float32) { for i, g := range gradient { m := aw.MeanFactor * mean[i] m += (1 - aw.MeanFactor) * g v := aw.VarianceFactor * variance[i] v += (1 - aw.VarianceFactor) * g * g mean[i] = m variance[i] = v m *= aw.meanRatio v *= aw.varianceRecip w := aw.weightDecay * weight[i] sd := float32(math.Sqrt(float64(v))) weight[i] = w - m/(sd+aw.Epsilon) } }

To receive a hint, submit unfixed code.