Abstract
Adaptive gradient methods—such as Adagrad, Adam, and their variants—have found widespread use in machine learning, signal processing, and many other settings. However many algorithms in this family are not rotationally equivariant: in this talk we examine how a simple change-of-basis in either parameter space or data space can drastically impact both the convergence rates and the generalization of these algorithms. We begin by studying reparameterizations in parameter space, and describe a data-driven method proposed in our recent work which produces a “favorable” basis for adaptive algorithms. Our method is an orthonormal transformation based on the expected gradient outer product (EGOP) matrix, which can be approximated using either full-batch or stochastic gradient oracles. We show that for a broad class of functions, the sensitivity of adaptive algorithms to choice-of-basis is influenced by the decay of the EGOP matrix spectrum. We illustrate the potential impact of EGOP reparameterization by presenting empirical evidence and theoretical arguments that common machine learning tasks with real-world data exhibit EGOP spectral decay. In the second half of the talk, we study how rotations in data space impact the generalization of the solutions produced by adaptive algorithms. We analyze the setting when the data distribution is a mixture of Gaussians. We characterize how the implicit bias of Adam and gradient descent change under rotations to this distribution, and demonstrate that the generalization of Adam can be extremely brittle to such rotations. We conclude by demonstrating the potential of EGOP reparameterization to reduce this sensitivity.
Bio
Adela DePavia is a PhD candidate in Computational and Applied Mathematics at the University of Chicago, co-advised by Lorenzo Orecchia and Rebecca Willett. She researches optimization algorithms for machine learning, developing new methods to achieve faster convergence and enable scaling to larger datasets. She is supported by an NSF Graduate Research Fellowship, and by a GEM PhD Fellowship sponsored by Lawrence Livermore National Lab. After receiving her B.Sc. in Physics from Yale University, she spent a year at Yale conducting research as an Emerging Scholars Initiative Postbaccalaureate Fellow prior to the start of her PhD.
Orchard View Room
Adela DePavia, University of Chicago