In this talk I will discuss and give some historical context for the phenomenon of interpolation (zero training loss). I will show how it provides a new perspective on machine learning forcing us to rethink some commonly held assumptions and points to significant gaps in our understanding, even in the simplest settings, of when classifiers generalize. I will outline some first theoretical results in that direction, showing that such classifiers can indeed
be statistically consistent and even optimal.
In the second part of the talk I will point to the computational the power of interpolation by describing how it results in very efficient optimization of over-parametrized models using Stochastic Gradient Descent. Furthermore, I will show how the simplicity of the setting can be harnessed to construct very fast and theoretically sound methods for training large-scale kernels. I will also briefly describe some new accelerated SGD methods for over-parametrized settings.
Discovery Building, Orchard View Room
Misha Belkin