Systems | Information | Learning | Optimization
 

SILO: Worst-case generation via minimax optimization in Wasserstein space

Abstract: Generative models such as normalizing flows and diffusion processes have transformed how we represent complex, high-dimensional data. In this talk, I introduce a framework for worst-case generation formulated as a minimax optimization problem in Wasserstein space. This perspective reveals a unified view of risk-induced generation and distributional robustness, showing that worst-case distributions arise …

SILO: Learning from the Right Teacher in Knowledge Distillation

Abstract: Knowledge distillation has become a central technique for training small language models, yet a fundamental question remains unresolved: what characterizes an effective teacher for a given student? This talk presents two complementary results that shed light on this problem. First, I will examine progressive distillation, where a student learns not only from …

SILO: Bayesian Preference Exploration: Making Optimization Accessible to Non-Experts

Abstract: Optimization problems are everywhere — routing trucks, buying groceries, building a datacenter.  Yet optimization methodology is hard to use. It requires the user to write down their objective and constraints as mathematical functions.  In practice, the objective and constraints are unknown and must be tuned iteratively. An expert presents …

SILO: Qualia Optimization: Exploring Mathematical Formulations of AI Experience

Abstract: This talk explores the speculative question: what if current or future AI systems have qualia, such as pain or pleasure? It does so by assuming that AI systems might someday possess qualia—and that the quality of these subjective experiences should be considered alongside performance metrics. Concrete mathematical problem settings, …

SILO: Revealing the Low Rank Structure of Language Models through Sequences of Logits

Abstract: A major problem in the study of large language models, and deep learning more broadly, is to understand their inherent low-dimensional structure.  We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide …

SILO: Decision-Aware Models for Adaptive Experimentation and Bayesian Optimization

Abstract:  Machine learning and AI models hold great promise for accelerating scientific discovery by intelligently planning experiments and learning from their results. While existing models can be applied in this setting, I will argue in this talk for decision-aware models designed explicitly for adaptive experimentation tasks rather than traditional predictive …