SILO: Understanding and Leveraging Adaptive Algorithms’ Sensitivity to Change-of-Basis
Abstract Adaptive gradient methods—such as Adagrad, Adam, and their variants—have found widespread use in machine learning, signal processing, and many other settings. However many algorithms in this family are not rotationally equivariant: in this talk we examine how a simple change-of-basis in either parameter space or data space can drastically …