Fall 2025 – SILO

SILO: Worst-case generation via minimax optimization in Wasserstein space

Abstract: Generative models such as normalizing flows and diffusion processes have transformed how we represent complex, high-dimensional data. In this talk, I introduce a framework for worst-case generation formulated as a minimax optimization problem in Wasserstein space. This perspective reveals a unified view of risk-induced generation and distributional robustness, showing that worst-case distributions arise …

SILO: Learning from the Right Teacher in Knowledge Distillation

Abstract: Knowledge distillation has become a central technique for training small language models, yet a fundamental question remains unresolved: what characterizes an effective teacher for a given student? This talk presents two complementary results that shed light on this problem. First, I will examine progressive distillation, where a student learns not only from …

SILO: Towards discrete diffusion models for language and image generation

Abstract: We discuss discrete diffusion models that offer a unified framework for jointly modeling categorical data such as text and images. We present a new model that we have developed for language generation called the Anchored Diffusion Language Model (ADLM). ADLM is grounded in a novel two-stage framework that first …

SILO: Bayesian Preference Exploration: Making Optimization Accessible to Non-Experts

Abstract: Optimization problems are everywhere — routing trucks, buying groceries, building a datacenter. Yet optimization methodology is hard to use. It requires the user to write down their objective and constraints as mathematical functions. In practice, the objective and constraints are unknown and must be tuned iteratively. An expert presents …

SILO: Random matrix theory and modern machine learning

Abstract: Random matrix theory is a large area with a long history, with elegant theory and a wide range of applications. However, the challenges of modern machine learning are forcing us not only to use random matrix theory in new ways, and also to chart out new directions for theory. …

SILO: Qualia Optimization: Exploring Mathematical Formulations of AI Experience

Abstract: This talk explores the speculative question: what if current or future AI systems have qualia, such as pain or pleasure? It does so by assuming that AI systems might someday possess qualia—and that the quality of these subjective experiences should be considered alongside performance metrics. Concrete mathematical problem settings, …

SILO: Faster Diffusion Language Models

Abstract: Diffusion language models (DLMs) represent a nascent but promising alternative to GPT-style autoregressive (AR) language models: as opposed to generating one token at a time left to right, DLMs start from a set of noise tokens which they iteratively refine into text. The any-order generation can potentially result in …

SILO: Revealing the Low Rank Structure of Language Models through Sequences of Logits

Abstract: A major problem in the study of large language models, and deep learning more broadly, is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide …

SILO: Decision-Aware Models for Adaptive Experimentation and Bayesian Optimization

Abstract: Machine learning and AI models hold great promise for accelerating scientific discovery by intelligently planning experiments and learning from their results. While existing models can be applied in this setting, I will argue in this talk for decision-aware models designed explicitly for adaptive experimentation tasks rather than traditional predictive …

SILO: Relying on the Metrics of Evaluated Agents

Abstract: Developers and regulators of online platforms and AI systems face a continuing problem of designing effective evaluation metrics. While tools for collecting and processing data continue to progress, this has not addressed the problem of “unknown unknowns”, or fundamental informational limitations on part of the evaluator. To guide the …