Fall 2024 – SILO

SILO: Theory for Diffusion Models

Abstract: In this talk I will survey our recent efforts to develop a rigorous theory for understanding diffusion generative modeling. The first part will cover discretization analyses that prove that diffusion models can approximately sample from arbitrary probability distributions provided one can have a sufficiently accurate estimate for the score …

SILO: Learning Dynamics for Nash and Coarse Correlated Equilibria in Bimatrix Games

Abstract: In this talk, we will focus on learning in two-player games. First, we will provide a brief introduction to the possible behaviors of learning algorithms and mention various techniques that have been extensively used to guarantee convergence to Nash equilibria in zero-sum games. Finally, we will demonstrate how these …

SILO: American Family Funding Initiative short talks

On the Effectiveness of Dataset Alignment for Fake Image Detection Anirudh Sundara Rajan, MS student, Computer Sciences Abstract: As latent diffusion models (LDMs) democratize image generation capabilities, there is a growing need to detect fake images. A good detector should focus on the generative model’s fingerprints while ignoring image properties …

SILO: Acceleration by Stepsize Hedging

Abstract: Can we accelerate the convergence of gradient descent without changing the algorithm — just by optimizing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k^{\log_p 2} = k^{0.7864}$ iterations, where $p=1+\sqrt{2}$ is the silver ratio and $k$ is …

SILO: Optimal vintage factor analysis with deflation varimax

Abstract: Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. The most widely used vintage factor analysis is the Principal Component Analysis …

SILO: Beyond Decoder-Only Next Token Prediction

Abstract: This talk presents two distinct approaches that expand the potential of Transformer architectures beyond the traditional decoder-only, causal-attention models for next-token prediction. In the first half, we will examine looped Transformers with an adaptive iteration mechanism, demonstrating that these models can learn highly length-generalizable solutions for algorithmic tasks. The …

SILO: Recent Advances in Min-max Optimization: Convergence Guarantees and Practical Performance

Abstract: Min-max optimization plays a prominent role in game theory, statistics, economics, finance, and engineering. It has recently received significant attention, especially in the machine learning community, where adversarial training of neural networks, multi-agent reinforcement learning, and distributionally robust learning are formulated as structured min-max optimization problems. Stochastic Gradient Descent Ascent …

SILO: Confidence Sequences via Online Learning

Abstract: Confidence sequence provides ways to characterize uncertainty in stochastic environments, which is a widely-used tool for interactive machine learning algorithms and statistical problems including A/B testing, Bayesian optimization, reinforcement learning, and offline learning. In these problems, constructing confidence sequences that are tight without losing correctness is crucial since it …

SILO: Variational Inference, MCMC, and Dense SoftMax-Cut

Abstract: A famous result in approximation algorithms is that the max-cut problem can be approximated up to (1 + \epsilon) error in polynomial time on dense graphs. Motivated by connections to markov chains, variational inference, and statistical physics, we show that this result can be recovered from a stronger fact: …

SILO: Hidden Convexity of Deep Neural Networks: Exact and Transparent Lasso Formulations via Geometric Algebra

Abstract: In this talk, we introduce an analysis of deep neural networks through convex optimization and geometric (Clifford) algebra. We begin by introducing exact convex optimization formulations for ReLU neural networks. This approach demonstrates that deep networks can be globally trained through convex programs, offering a globally optimal solution. Our …