Instructor: MIT
SILO: Theory and practice of LLM quantization
Abstract Modern LLMs process information by repeatedly applying a basic primitive of matrix multiplication. Estimates show that about 60-84% of the energy consumed by LLMs goes into memory load/store operations. How can we reduce this power consumption? LLM converts text into a sequence of tokens (which can be thought as …
SILO: On counterfactual inference with unobserved confounding via exponential family
Abstract: We are interested in the problem of unit-level counterfactual inference in the presence of unobserved confounders owing to the increasing importance of personalized decision-making in many domains: consider a recommender system interacting with a user over time where each user is provided recommendations based on observed demographics, prior engagement …