Systems | Information | Learning | Optimization
 

SILO: Searching for architectures and BERT moments in specialized AI applications

Abstract:  In 2018, advances in architecture design and self-supervised learning led to the “BERT moment” in natural language processing, in which supervised learning workflows were permanently supplanted by the pretraining and fine-tuning of massive Transformer models. This spurred scientists in more specialized areas—e.g. genomics, satellite imaging, and time series forecasting—to develop …

SILO: Do Large Language Models Need Statistical Foundations?

Abstract: In this talk, we advocate for the development of rigorous statistical foundations for large language models (LLMs). We begin by elaborating two key features that motivate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the complexity and black box nature of Transformer architectures. …

SILO: Variational inference – reconciling statistical and convergence guarantees

Abstract: As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming increasingly popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter …

SILO: On counterfactual inference with unobserved confounding via exponential family

Abstract: We are interested in the problem of unit-level counterfactual inference in the presence of unobserved confounders owing to the increasing importance of personalized decision-making in many domains: consider a recommender system interacting with a user over time where each user is provided recommendations based on observed demographics, prior engagement …