UW-Madison – SILO

SILO: First-Order Algorithms for Large-Scale Optimization

Abstract: It is well known that for nonconvex unconstrained optimization with Lipschitz smoothness, gradient descent and stochastic gradient descent are the optimal first-order algorithms in the deterministic and stochastic settings, respectively. This naturally raises two questions: In the constrained setting, is it possible to design algorithms that achieve the same …

SILO: Searching for architectures and BERT moments in specialized AI applications

Abstract: In 2018, advances in architecture design and self-supervised learning led to the “BERT moment” in natural language processing, in which supervised learning workflows were permanently supplanted by the pretraining and fine-tuning of massive Transformer models. This spurred scientists in more specialized areas—e.g. genomics, satellite imaging, and time series forecasting—to develop …

SILO: Stable Estimators for Fast Private Statistics

Abstract: We will discuss a new set of techniques for stable statistical estimation, leading to fast and near-optimal private algorithms for mean estimation, covariance estimation, and linear regression. The analysis proceeds by constructing a stabilizing wrapper around a greedy outlier-removal process. We will also discuss connections with a recent line …

SILO: Variational inference – reconciling statistical and convergence guarantees

Abstract: As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming increasingly popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter …

SILO: Towards Secure Large Language Models: From Model to System

Abstract: We are witnessing a paradigm shift in AI, transitioning from deep learning models to the era of Large Language Models (LLMs). This shift signifies a transformative advancement in AI, enabling it to be applied to diverse real-world safety-critical applications. Despite these impressive achievements, a fundamental question remains: are …

SILO: Minimizing quadratics over integers

Abstract: Mixed integer quadratic programming is the problem of minimizing a quadratic polynomial over points in a polyhedral region with some integer components. It is a natural extension of mixed integer linear programming, and it has a wide array of applications. In this talk, I will survey some recent theoretical …

SILO: Self-Improving Transformers: Overcoming Length Generalization Challenges

Abstract: Large language models can perform algorithmic tasks through test-time computation but struggle to generalize far beyond the task difficulty of the training distribution. These limitations manifest across even simple tasks like arithmetic, string manipulation, and maze solving, where transformers learn shortcuts rather than the underlying algorithms. While prior solutions …

SILO: Efficiently Searching for Distributions

Abstract: How efficiently can we search distributions? The problem is modeled as follows: we are given knowledge of k discrete distributions v_i for 1 <= i <= k over the domain [n] = {1,…,n} which we can preprocess. Then we get samples from an unknown discrete distribution p, also over …