Generalization and optimization of deep networks

From the confusion surrounding the optimization and generalization of deep networks has arisen an exciting possibility: gradient descent is implicitly regularized, meaning it not only outputs iterates of low error, but moreover iterates of low complexity.

This talk starts with a “spectrally-normalized” generalization bound which is small if gradient descent happens to select iterates with certain favorable properties. These properties can be verified in practice, but the bulk of the talk will work towards theoretical guarantees, showing firstly that even stronger properties hold for logistic regression, and secondly for linear networks of arbitrary depth.

Joint work with Peter Bartlett, Dylan Foster, and Ziwei Ji.

November 7, 2018

12:30 pm (1h)

Discovery Building, Orchard View Room

Matus Telgarsky