Systems | Information | Learning | Optimization
 

SILO: Learning from the Right Teacher in Knowledge Distillation

Abstract:
Knowledge distillation has become a central technique for training small language models, yet a fundamental question remains unresolved: what characterizes an effective teacher for a given student? This talk presents two complementary results that shed light on this problem.
First, I will examine progressive distillation, where a student learns not only from a final teacher checkpoint but also from its intermediate checkpoints. Using sparse parity as a controlled testbed, we identify an implicit curriculum present only in these intermediate checkpoints—yielding both empirical speedups and provable sample-complexity gains. I will then show how this curriculum structure extends to transformer pre-training on real-world corpora (Wikipedia and Books), where intermediate checkpoints progressively capture longer-range context dependencies.
Second, I will introduce GRACE, a lightweight metric for identifying an effective teacher in a pool of teachers for math-reasoning distillation. GRACE analyzes the distribution of student gradients on teacher-generated responses and compares teacher quality with two components: teacher–student alignment and teacher response diversity. I will also show how GRACE connects to an information-theoretic leave-one-out stability notion that has been studied in modern generalization theory. Empirically, GRACE correlates strongly (up to 86%) with the downstream performance of distilled students on GSM8K and MATH, and achieves over a 7.4% improvement when used to select a teacher compared to choosing the highest-accuracy teacher.
Based on the following 2 works:
Bio: 
Abhishek Panigrahy is a final-year Ph.D. student in Computer Science at Princeton University, advised by Prof. Sanjeev Arora. His research centers on developing mathematical models to understand and improve the efficiency and robustness of training deep learning models. He is an Apple AI/ML Ph.D. scholar and Siebel scholar for the year 2025-26.

December 3, 2025
12:30 pm (1h)

Orchard View Room

Abhishek Panigrahy, Princeton University