In this talk we will argue that much of the traditional notions of ‘distance’ (e.g. KL-divergence, extensions of TV such as D_A discrepancy, density-ratios, Wasserstein distance) can yield an over-pessimistic picture of transferability. Instead, we show that some new notions of ‘relative dimension’ between source and target (which we simply term ‘transfer-exponents’) capture a continuum from easy to hard transfer. Transfer-exponents uncover a rich set of situations where transfer is possible even at fast rates; they encode relative benefits of source and target samples, and have interesting implications for related problems such as ‘multi-task or multi-source learning’.
In particular, in the case of transfer from multiple sources, we will discuss (if time permits) a strong dichotomy between minimax and adaptive rates: no adaptive procedure exists that can achieve the same rates as minimax (oracle) procedures.
The talk is based on earlier work with Guillaume Martinet, and ongoing work with Steve Hanneke.
I work in statistical machine learning, with an emphasis on common nonparametric methods (e.g., kNN, trees, kernel averaging). I’m particularly interested in adaptivity, i.e., how to automatically leverage beneficial aspects of data as opposed to designing specifically for each scenario. This involves characterizing statistical limits, under modern computational and data constraints, and identifying favorable aspects of data that help circumvent these limits.
Some specific interests: notions of intrinsic data dimension, benefits (or lack thereof) of sparse or manifold representations; performance limits and adaptivity in active learning, transfer and multi-task learning; hyperparameter-tuning and guarantees in density-based clustering.