Systems | Information | Learning | Optimization
 

SILO: High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws

Abstract Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply …