Title: Inference using Network Structure: A comparison of statistical and machine learning perspectives

Abstract: Recent years have witnessed a surge in the amount of available structured data, typically modeled as a network capturing relationships (e.g. similarities, interactions) between different entities. In a typical data analysis setting, this network structure arises either “horizontally” across features, or “vertically” across observations. A large body of literature in statistics and machine learning has thus been devoted to finding ways of appropriately exploiting this type of structure for inference and/or prediction. Yet, the tools and the desired results differ substantially in these two disciplines – a phenomenon that we propose exploring in this talk as we ask: could some lessons be shared from statistics to machine learning (and vice versa)? To exemplify this observation, in the first part of the talk, we will consider a classical statistical take on the problem and use network structure as a regularizer. We will introduce a novel ℓ1 + ℓ2-penalty, which we refer to as the Generalized Elastic Net, for regression problems where the feature vectors are indexed by vertices of a given graph and the true signal is believed to be smooth with respect to the graph. Under the assumption of correlated Gaussian design, we will derive upper bounds for the prediction and estimation errors which are graph-dependent and consist of a parametric rate for the unpenalized portion of the regression vector and another term that depends on our network alignment assumption. In the second part of the talk, we will contrast this approach to more recent Machine Learning tools for prediction tasks on graphs. More specifically, we will focus on Graph Neural Networks (GNNs)— a generalization of the Deep Neural Network machinery to the graph setting. Despite the multiplication of the use of GNNs across tasks and applications, the inner workings and impact of the different design choices of this class of algorithms remain ill-understood. Using similarities of GNNs with well-established statistical methods, I will describe some recent work that attempts to understand the effect of the convolution operator used to aggregate information over entire neighborhoods on the geometry of the GNN embedding space.