Non-exhaustive, overlapping k-means clustering

Optimizing the k-means objective with Lloyd’s algorithm is a classic strategy to separate a set of datapoint into individual clusters. More recent data in biology and social networks necessitates overlapping clustering. I’ll present a generalization of the k-means objective function that incorporates overlapping partitions as well as outlier detection. A weighted, kernel-based variation on this objective is also equivalent to the normalized cut objective on graphs. We’ll see a few algorithms to optimize this objective function including a generalization of Lloyd’s algorithm, a semi-definite relaxation, and a variety of solvers for a non-convex low-rank SDP heuristic. These methods produce state of the art accuracy results on clustering problems where ground-truth clusters are known.

November 18, 2015

12:30 pm (1h)

Discovery Building, Orchard View Room

David Gleich