Bug 64: An Apple In The Garden Of Eden
- This is the AdamW variation on the Adam algorithm for stochastic gradient descent (Kingma & Ba, 2014).
- AdamW is described in the paper Decoupled Weight Decay Regularization (Loshchilov & Hutter, 2017).
- The uppercase (exported) fields of the struct should be assigned prior to the first method call.
- The Before method should be called once prior to each end-to-end neural network backward pass.
- The Update method should be called once for each slice of weights during each backward pass.
Fix The Tiny Bug In This Go Code: