Optimizing the k-means objective with Lloyd’s algorithm is a classic strategy to separate a set of datapoint into individual clusters. More recent data in biology and social networks necessitates overlapping clustering. I’ll present a generalization of the k-means objective function that incorporates overlapping partitions as well as outlier detection. A weighted, kernel-based variation on this objective is also equivalent to the normalized cut objective on graphs. We’ll see a few algorithms to optimize this objective function including a generalization of Lloyd’s algorithm, a semi-definite relaxation, and a variety of solvers for a non-convex low-rank SDP heuristic. These methods produce state of the art accuracy results on clustering problems where ground-truth clusters are known.
November 18, 2015
12:30 pm (1h)
Discovery Building, Orchard View Room
David Gleich