SILO: Example Selection and Post-Training Quantization

Abstract:

This talk will cover a pair of recent results unified by the method of approximation with simple linear error feedback. First, we will look at training example ordering for stochastic gradient descent, which has long been known to affect convergence rate. We will develop a theoretical characterization of what it is about the example order that affects convergence, and use this to motivate GraB (gradient balancing), an efficient linear-error-feedback-based example selection algorithm that yields a theoretically optimal convergence rate that’s faster than the classic random-reshuffling scheme. Second, we will look at post-training quantization (PTQ), an especially important task in the practice of Large Language Model (LLM) inference, where a trained model is compressed without any additional fine-tuning. A theoretical characterization of the accuracy of “adaptive” linear-feedback quantization schemes will motivate QuIP (quantization with incoherence processing), a new approach to quantization that enables 2-bit LLMs and comes with theoretical error guarantees. The talk will conclude with some thoughts about future work along these lines in machine learning systems.

Bio:

Chris De Sa is an Assistant Professor in the Computer Science department at Cornell University. He is a member of the Cornell Machine Learning Group and leads the Relax ML Lab. His research interests include algorithmic, software, and hardware techniques for high-performance machine learning systems, with a focus on relaxed-consistency variants of stochastic algorithms such as asynchronous and low-precision stochastic gradient descent (SGD) and Markov chain Monte Carlo. The RelaxML lab builds towards using these techniques to construct data analytics and machine learning frameworks, including for deep learning, that are efficient, parallel, and distributed.

October 18, 2023

12:30 pm (1h)

Orchard View Room

Christopher De Sa, Cornell