Abstract: We consider the learning paradigm consisting of training general neural networks (NN) with (S)GD. What is the class of functions that can be learned with this paradigm? How does it compare to known classes such as PAC and SQ? Is depth/overapametrization needed to learn certain classes? We show that SGD on NN is equivalent to PAC while GD on NN is equivalent to SQ, obtaining a separation between the classes learned by SGD and GD. We further give a strong separation with kernel methods, exhibiting function classes that kernels cannot learn with non-trivial edge but that GD on a differentiable model can learn with perfect edge. Based on joint works with C. Sandon and E. Malach, P. Kamath, N. Srebro.