Diagnostic tools

  • Update (how much weights are moving in a single step) to weight magnitude ratio i.e. update.std()/weight.std() is a good thing to look at. You should expect about 1e-3 in ratio. If too low or too high, this can help tweak the learning rate.