Resource allocation for SGD via adaptive batch sizes
Stochastic gradient descent (SGD) approximates the objective function’s gradient with a constant and typically small number of examples a.k.a. the batch size of mini-batch SGD. Small batch sizes can present a significant amount of noise near the optimum. This work presents a method to grow the batch adaptively with model …