With the availability of huge amounts of unlabeled data, unsupervised learning methods are gaining increasing popularity and importance. We focus on ”unsupervised ensemble learning”, where one obtains the predictions of multiple classifiers over a set of unlabeled instances. The classifiers may be human experts as in crowdsourcing, or prediction algorithms developed by research groups worldwide. The challenge is to estimate the accuracies of the different classifiers and combine them to an accurate meta-learner. To tackle this problems we show how it relates to latent variable models, and derive simple estimates for the classifiers’ accuracies based on a spectral analysis of the observed data. On the experimental side, we apply our methods to a problem in Computational Biology, where for various classification tasks one combines the results of multiple algorithms for improved accuracy. In the second part of the talk, I will focus on extending the techniques developed for unsupervised ensemble learning to a specific family of linear latent variable models. For cases where the latent layer is binary, we derive an interesting relation between the model parameters and the relatively recent notion of tensor eigenvectors of the data higher order moments. We apply our methods to overlapping clustering, a problem that gained popularity due its applicability in various domains such as gene expressions analysis and text categorization.
January 17 @ 12:30
12:30 pm (1h)
Discovery Building, Orchard View Room