Generalization and optimization of deep networks
From the confusion surrounding the optimization and generalization of deep networks has arisen an exciting possibility: gradient descent is implicitly regularized, meaning it not only outputs iterates of low error, but moreover iterates of low complexity. This talk starts with a “spectrally-normalized” generalization bound which is small if gradient descent …