location: Orchard View Room
SILO: Learning from the Right Teacher in Knowledge Distillation
Abstract: Knowledge distillation has become a central technique for training small language models, yet a fundamental question remains unresolved: what characterizes an effective teacher for a given student? This talk presents two complementary results that shed light on this problem. First, I will examine progressive distillation, where a student learns not only from …
SILO: Towards discrete diffusion models for language and image generation
Abstract: We discuss discrete diffusion models that offer a unified framework for jointly modeling categorical data such as text and images. We present a new model that we have developed for language generation called the Anchored Diffusion Language Model (ADLM). ADLM is grounded in a novel two-stage framework that first …
SILO: Bayesian Preference Exploration: Making Optimization Accessible to Non-Experts
Abstract: Optimization problems are everywhere — routing trucks, buying groceries, building a datacenter. Yet optimization methodology is hard to use. It requires the user to write down their objective and constraints as mathematical functions. In practice, the objective and constraints are unknown and must be tuned iteratively. An expert presents …
SILO: Qualia Optimization: Exploring Mathematical Formulations of AI Experience
Abstract: This talk explores the speculative question: what if current or future AI systems have qualia, such as pain or pleasure? It does so by assuming that AI systems might someday possess qualia—and that the quality of these subjective experiences should be considered alongside performance metrics. Concrete mathematical problem settings, …