I will talk about three simple combinatorial ideas to speed up parallel and distributed learning algorithms. We will start off with serial equivalence in asynchronous parallel ML, its significance, and how we can guarantee it using a recent phase transition result on graphs. We continue on the issue of stragglers (i.e., slow nodes) in distributed systems, where we will use erasure codes to robustify gradient based algorithms against delays. In our third example, we will reduce the high communication cost of data-shuffling in distributed learning, using the seemingly unrelated notion of coded caching. For all the above, we will see real world experiments that indicate how these simple ideas can significantly improve the speed and accuracy of large-scale learning.
Discovery Building, Orchard View Room