Systems | Information | Learning | Optimization
 

SILO: How to Use Synthetic Data for Improved Statistical Inference?

Abstract The rapid proliferation of high-quality synthetic data — generated by advanced AI models or collected as auxiliary data from related tasks — presents both opportunities and challenges for statistical inference. Here, we introduce the GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance …

SILO: Do Large Language Models Need Statistical Foundations?

Abstract: In this talk, we advocate for the development of rigorous statistical foundations for large language models (LLMs). We begin by elaborating two key features that motivate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the complexity and black box nature of Transformer architectures. …