Urvashi Oswal – Block CUR Decomposition: Matrix Approximation using Groups of Columns

A common problem in large-scale data analysis is approximating a matrix using a combination of specifically sampled rows and columns. Unfortunately, in many real-world environments, the ability to sample specific individual rows or columns of the matrix is limited by either system constraints or cost. In this talk, we will consider matrix approximation when only predefined blocks of columns (or rows) can be sampled from the matrix. This has application in problems as diverse as hyper-spectral imaging, biometric data analysis, and distributed computing. I will propose an algorithm for sampling useful column blocks and provide worst-case bounds for the accuracy of the resulting matrix approximation. Our algorithm considers both when the matrix is fully available and when only the sampled rows and columns of the matrix have been observed. The practical application of these algorithms will be shown via experimental results using real-world user biometric data from a content testing environment.

This is joint work with Brian Eriksson, Kevin Xu and Swayambhoo Jain.

Eric Hall – Inferring Relationships in High-dimensional Generalized Linear Autoregressive Models

Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often times these models are used successfully to learn network structure in applications such as social, epidemiological, financial, and biological neural networks. However, little is known about statistical guarantees of estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalized linear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modified covering techniques originally proposed for high-dimensional linear regression analysis. These bounds characterize the impact of various network parameters on estimator performance.

Discovery Building, Orchard View Room

Eric Hall, Urvashi Oswal