Systems | Information | Learning | Optimization
 

SILO: First-Order Algorithms for Large-Scale Optimization

Abstract: It is well known that for nonconvex unconstrained optimization with Lipschitz smoothness, gradient descent and stochastic gradient descent are the optimal first-order algorithms in the deterministic and stochastic settings, respectively. This naturally raises two questions: In the constrained setting, is it possible to design algorithms that achieve the same …

SILO: Searching for architectures and BERT moments in specialized AI applications

Abstract:  In 2018, advances in architecture design and self-supervised learning led to the “BERT moment” in natural language processing, in which supervised learning workflows were permanently supplanted by the pretraining and fine-tuning of massive Transformer models. This spurred scientists in more specialized areas—e.g. genomics, satellite imaging, and time series forecasting—to develop …