Instructor: University of Pennsylvania
SILO: Do Large Language Models Need Statistical Foundations?
Abstract: In this talk, we advocate for the development of rigorous statistical foundations for large language models (LLMs). We begin by elaborating two key features that motivate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the complexity and black box nature of Transformer architectures. …
SILO: Acceleration by Stepsize Hedging
Abstract: Can we accelerate the convergence of gradient descent without changing the algorithm — just by optimizing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k^{\log_p 2} = k^{0.7864}$ iterations, where $p=1+\sqrt{2}$ is the silver ratio and $k$ is …