Systems | Information | Learning | Optimization

Dictionary Learning from dependent data samples

Abstract:Online Dictionary learning is an unsupervised machine learning method that extracts key features from a stream of data samples, which involves solving constrained nonconvex optimization problems in a stochastic setting. While several such algorithms have been proposed and studied for i.i.d. data samples, in practice it is far more convenient to work with dependent data samples obtained from an MCMC sampler. In this talk, we discuss some new results on online dictionary learning algorithms that directly incorporate dependent data samples. We establish global convergence to stationary points for a wide range of algorithms and also obtain a worst-case rate of convergence of order $O(\log n / n^{1/4})$. This gives a first rate of convergence results for online matrix/tensor factorization algorithms even in the i.i.d. setting. As an application, we present a new approach for learning “basis subgraphs” from network data, that can be used for network denoising and edge inference tasks. We illustrate our method using several synthetic network models as well as Facebook, arXiv, and protein-protein interaction networks, that achieve state-of-the-art performance for such network tasks when compared to several recent methods.

Bio:Prof. Hanbaek Lyu,

October 13 @ 12:30
12:30 pm (1h)

Orchard View Room, Virtual