SILO: Benign Overfitting in Two-layer ReLU Networks

Title: Benign Overfitting in Two-layer ReLU Networks

Abstract: Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as “benign overfitting”. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, they are limited to neural networks with smooth activation functions, leaving open the question of how and when benign overfitting can occur in the widely-used ReLU networks. In this talk, I will present a new result on benign overfitting in two-layer convolutional neural networks (CNNs) with ReLU activations. Specifically, we establish algorithm-dependent risk bounds for learning two-layer ReLU CNNs in the presence of label-flipping noise. Our analysis reveals that, under mild conditions, a neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk. Furthermore, our result uncovers a sharp transition between benign and harmful overfitting under different conditions on the data distribution. If time allows, I will discuss some possible extensions of this work.

Short bio: Quanquan Gu is an Associate Professor of Computer Science at UCLA. His research is in the area of artificial intelligence and machine learning, with a focus on developing and analyzing nonconvex optimization algorithms for machine learning to understand large-scale, dynamic, complex, and heterogeneous data and building the theoretical foundations of deep learning and reinforcement learning. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Simons Berkeley Research Fellowship among other industrial research awards.

March 29, 2023

12:30 pm (1h)

Orchard View Room, Virtual

Quanquan Gu

Video