Unsupervised Ranking and Ensemble Learning

In various decision making problems, one is given the advice or predictions of several classifiers of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting where classifier accuracy can be assessed using available labeled training or validation data, and raises several questions: given only the predictions of several classifiers of unknown accuracies, over a large set of unlabeled test data, is it possible to a) reliably rank them, and b) construct a meta-classifier more accurate than any individual classifier in the ensemble?

In this talk we’ll show that under standard independence assumptions between classifier errors, this high dimensional data hides a simple low dimensional structure. In particular, we’ll present a novel spectral approach to address the above questions, and derive a novel unsupervised spectral meta-learner (SML). On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point for iterative estimation of the maximum likelihood estimator than classical majority voting. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.

Joint work with Fabio Parisi, Francesco Strino and Yuval Kluger (Yale).

May 7, 2014

12:30 pm (1h)

Discovery Building, Orchard View Room

Boaz Nadler