Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect convergence. In this work, we present the first theoretical characterization of the speed-up offered by asynchronous …