Systems | Information | Learning | Optimization
 

SILO: Revealing the Low Rank Structure of Language Models through Sequences of Logits

Abstract:
A major problem in the study of large language models, and deep learning more broadly, is to understand their inherent low-dimensional structure.  We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model’s logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation — in particular, we can generate a response to a target prompt using a linear combination of the model’s outputs on unrelated, or even nonsensical prompts.
On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.
Bio: 
Allen Liu is currently a Miller Postdoctoral Fellow at UC Berkeley and will be starting as an Assistant Professor of Computer Science at the Courant Institute in NYU next fall. Previously, he completed his Ph.D. at MIT, advised by Ankur Moitra.  His research is in learning theory, broadly defined, encompassing classical learning theory and statistics, as well as problems in modern machine learning and scientific applications such as quantum information.  Allen is the recipient of a Hertz Fellowship and a Citadel GQS Fellowship. His work has been awarded Best Student Paper at QIP in 2024 and featured in popular science media including Quanta Magazine’s Biggest Breakthroughs in Computer Science.

October 22, 2025
12:30 pm (1h)

Morgridge Hall- 7th floor

Allen Liu