With a flurry of recent research motivated by applications to machine learning, convergence of gradient descent methods for smooth non-convex unconstrained optimization is well understood in the centralized setting. In this talk I will discuss our progress towards understanding how convergence of gradient descent methods (including SGD and acceleration) is affected by:
- Presence of linear inequality constraints,
- Compression of gradients in order to minimize communication during distributed computation.
In both cases for convergence to a second-order stationary point linear dependence on the dimension is currently required in otherwise (almost) dimension-free methods. It remains open whether such dependence is necessary.
Joint work with Dmitrii Avdiukhin (Indiana University, Bloomington) and Chi Jin (Princeton).
May 13, 2020
12:30 pm (1h)
Zoom
Grigory Yaroslavtsev