A major challenge in the theory of deep learning is to understand the computational complexity of learning basic families of neural networks (NNs). The challenge here arises from the non-convexity of the associated optimization problem. It is well known that the learning problem is computationally intractable in the worst case. Positive results have circumvented this hardness by making assumptions on the distribution as well as the label noise.
In this talk, I will focus on the problem of learning shallow NNs under the benign gaussian input distribution. I will first discuss a super-polynomial Statistical Query (SQ) lower bound in the simple noiseless setting. I will further show how to use this result to obtain a super-polynomial SQ lower bound for learning a single neuron in the agnostic noise model. Lastly, on the positive side, I will describe a gradient-based algorithm for approximately learning ReLUs which runs in polynomial time.
This talk is based on multiple works with Ilias Diakonikolas, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, Adam Klivans and Mahdi Soltanolkotabi