Systems | Information | Learning | Optimization
 

SILO: Searching for architectures and BERT moments in specialized AI applications

Abstract: 
In 2018, advances in architecture design and self-supervised learning led to the “BERT moment” in natural language processing, in which supervised learning workflows were permanently supplanted by the pretraining and fine-tuning of massive Transformer models. This spurred scientists in more specialized areas—e.g. genomics, satellite imaging, and time series forecasting—to develop “foundation models” (FMs) of their own. In a broad investigation of over thirty such models on over fifty tasks across three distinct modalities, we find that these specialized FMs still struggle to beat (often much cheaper) supervised learning pipelines. This indicates that the benefits of large-scale pretraining have yet to be fully realized in these domains and that better evaluations need to be developed in order to drive and measure progress. The broad scope of our study is enabled by new methods extending neural architecture search—a technique previously used mainly for vision tasks—to applications in the natural sciences and engineering.
Bio: 
Misha is an assistant professor of computer sciences at the University of Wisconsin-Madison, where he studies foundations and applications of machine learning. Most recently, he has been working on specialized foundation models and on incorporating AI tools into algorithm design and scientific computing. Previously, he spent a year as a postdoctoral fellow at Princeton Language & Intelligence after completing a PhD in CS at CMU, where he was a Facebook PhD Fellow, TCS Presidential Fellow, and runner-up for the School of Computer Science Doctoral Dissertation Award.

September 24, 2025
12:30 pm (1h)

Orchard View Room

Misha Khodak, UW-Madison