We first consider grossly over-parametrized matrix estimation with a quadratic loss, for which all global optima provably overfit noisy data. We show that with early stopping, gradient descent from a small random initialization converges to the best rank-r approximation of a general matrix (low-rank or not) at a linear rate. This result highlights the implicit regularization effect of early stopping as well as the connection between random initialization and spectral learning.
We next consider a non-smooth formulation for recovering a low-rank matrix from linear measurements with adversarial corruption. We show that subgradient method equipped with an adaptive stepsize converges at a sublinear rate with over-parametrization and at a linear rate with exact-parametrization, whereas standard stepsize rules fail to perform well in both settings. We further demonstrate that with small initialization, our method regains linear convergence even under over-parametrization.
Bio: Yudong Chen is a faculty member of the Computer Sciences Department, University of Wisconsin-Madison. Before joining UW-Madison, he was an associate professor at the School of Operations Research and Information Engineering, Cornell University, and a postdoctoral scholar at University of California, Berkeley. He obtained his Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin. His research interests include machine learning, high-dimensional statistics, convex and nonconvex optimization, and reinforcement learning.