Systems | Information | Learning | Optimization

Heavy hitters via cluster-preserving clustering


In the “heavy hitters” or “frequent items” problem, one must process a stream of items and report those items that occur frequently. For example, a telecommunications company may wish to find popular destination IP addresses in a packet stream across one of their links, or a search engine may wish to report popular query words. A more general problem is when there are two streams and we must report those items whose frequencies significantly deviate between them; the former problem is a special case since we can artificially pretend the first of the two streams the empty stream. Such a problem naturally arises in trend detection and anomaly detection. Several algorithms were known in the literature solving these problems, such as the CountMin sketch, CountSketch, Hierarchical CountSketch and others.

April 19 @ 12:30
12:30 pm (1h)

Discovery Building, Orchard View Room

Jelani Nelson