Dimitris Papailiopoulos

SILO: Self-Improving Transformers: Overcoming Length Generalization Challenges

Abstract: Large language models can perform algorithmic tasks through test-time computation but struggle to generalize far beyond the task difficulty of the training distribution. These limitations manifest across even simple tasks like arithmetic, string manipulation, and maze solving, where transformers learn shortcuts rather than the underlying algorithms. While prior solutions …

Overcoming the Challenges of Learning in Parallel

Distributed implementations of popular machine learning algorithms exhibit poor scaling when deployed on more than a few tens of compute nodes. The key sources of this poor performance are communication bottlenecks and straggler nodes in the system. In this talk, I will explain why these bottlenecks are a real challenge …

Speeding up Machine Learning using Graphs and Codes

Video: https://vimeo.com/185052120 I will talk about three simple combinatorial ideas to speed up parallel and distributed learning algorithms. We will start off with serial equivalence in asynchronous parallel ML, its significance, and how we can guarantee it using a recent phase transition result on graphs. We continue on the issue …