Speaker: Jonathan Frankle
Title: Faster Neural Network Training, Algorithmically
Abstract: Training modern neural networks is time-consuming, expensive, and energy-intensive. As neural network training costs double in size every few months, it is difficult for researchers and businesses without immense budgets to keep up, especially as hardware improvements stagnate. In this talk, I will describe my favored approach for managing this challenge: changing the workload itself – the training algorithm. Unlike most workloads in computer science, machine learning is approximate, and we need not worry about changing the underlying algorithm so long as we properly account for the consequences. I will discuss how we have put this approach into practice at MosaicML, including the dozens of algorithmic changes we have studied (which are freely available open source), the science behind how these changes interact with each other (the composition problem), and how we evaluate whether these changes have been effective. I will also detail several surprises we have encountered and lessons we have learned along the way. In the months since we began this work, we have reduced the training times of standard computer vision models by 5-7x and standard language models by 2-3x on publicly available cloud instances, and we’re just scratching the surface. I will close with a number of open research questions we have encountered that merit the attention of the research community. This is the collective work of a dozen empirical deep learning researchers at MosaicML, and I’m simply the messenger.
Orchard View Room, Virtual