Systems | Information | Learning | Optimization
 

SILO: First-Order Algorithms for Large-Scale Optimization

Abstract:
It is well known that for nonconvex unconstrained optimization with Lipschitz smoothness, gradient descent and stochastic gradient descent are the optimal first-order algorithms in the deterministic and stochastic settings, respectively. This naturally raises two questions:
  • In the constrained setting, is it possible to design algorithms that achieve the same convergence rate as GD and SGD?
  • When Lipschitz smoothness or curvature varies significantly, how can we design adaptive algorithms?
In this talk, we explore these two directions. In the first part, we will discuss optimal deterministic and stochastic algorithms for solving problems with deterministic and stochastic constraints, achieving the best possible convergence rates and sample complexity. In the second part, we turn to a recently popular algorithm for LLM reasoning called GRPO. We show, both theoretically and empirically, that GRPO is in fact a gradient-based method with adaptive step sizes chosen according to local curvature, providing an explanation for its strong empirical performance.
Biography:
Jiawei Zhang is an assistant professor in the Department of Computer Sciences at the University of Wisconsin-Madison. Prior to joining UW–Madison, he was a postdoctoral researcher in the Laboratory for Information & Decision Systems (LIDS) at MIT, working with Prof. Asuman Ozdaglar and Prof. Saurabh Amin. He received his PhD in Computer and Information Engineering from the Chinese University of Hong Kong, Shenzhen, under the supervision of Prof. Zhi-Quan (Tom) Luo. He earned his BSc in Mathematics (Hua Loo-Keng Talent Program) from the University of Science and Technology of China. His research interests include:
  • Optimization theory and algorithms with applications in machine learning, energy, and signal processing
  • Optimization, generalization, and robustness of machine learning, reinforcement learning, and generative models (including diffusion models, large models, foundation models)
October 1, 2025
12:30 pm (1h)

Orchard View Room

Jiawei Zhang, UW-Madison