SILO: Learning from the Right Teacher in Knowledge Distillation
Abstract: Knowledge distillation has become a central technique for training small language models, yet a fundamental question remains unresolved: what characterizes an effective teacher for a given student? This talk presents two complementary results that shed light on this problem. First, I will examine progressive distillation, where a student learns not only from …