Learning with Aggregated Data; a Tale of Two Approaches

For many applications in healthcare, econometrics, financial forecasting and climate science, data can only be obtained as aggregates. This begs the question, can one construct accurate models using only aggregates? I will present two vignettes outlining recent work towards an answer.

First, consider a sparse linear model learned from IID data aggregated into groups, where only empirical moments of each group are observed. Despite this obfuscation of individual data values, we show that subject to standard conditions, the parameter is recoverable with high probability using standard algorithms. Second, consider learning with aggregated correlated data such as time series or spatial data. Here, standard techniques fail. Instead, we propose a simple procedure which exploits Fourier transforms and achieves strong generalization error guarantees. In both settings, empirical evaluation on datasets from healthcare, agricultural studies, ecological surveys and climate science are presented to demonstrate efficacy.

Joint work with Avradeep Bhowmik and Joydeep Ghosh.

Video: https://vimeo.com/239177265

October 18, 2017

12:30 pm (1h)

Discovery Building, Orchard View Room

Sanmi Koyejo