Internet Device Graphs

Digital adverting is arguably the largest and most ubiquitous application of machine learning. Learning algorithms pick the ads we see by inferring information about who we are and what we might buy. Graph datasets, due to their simplicity, play a central role in facilitating this inference.

Internet Device Graphs are datasets that organize the identifiers we produce as our devices (phones, PCs, tablets and smart TVs) access media on the internet. This talk will present an algorithm for building a device graph from limited information. When the algorithm is applied to a dataset from a large internet analytics company, the resulting graph exhibits immense scale with greater than 17 billion edges (relationships) between more than 3 billion nodes (devices), accounting for the vast majority of internet connected devices in the US (and a list of other countries). Applying and tuning community detection algorithms partitions the graph into cohorts that are well aligned with families and individuals. Further refinement using a variety of ML techniques improve the associations and inference. I’ll conclude with how to turn off the identifiers your devices create.

This is joint work with Jon Koller, Paul Barford, Enis Alp, Aaron Cahn and Keith Funkhouser.

September 4 @ 12:30
12:30 pm (1h)

Discovery Building, Orchard View Room

Matt Malloy