SILO: Efficiency in Semi- and Self-Supervised Learning of Visual Representations

Speaker: Mike Rabbat (Meta,https://ai.facebook.com/people/michael-rabbat)

Abstract: Learning with less labeled data is a longstanding challenge in machine learning research. In this talk I’ll present two recent methods to efficiently train models in situations when labeled examples are scarce but unlabeled examples are abundant. Both approaches aim to strike a balance between label efficiency and computational efficiency. The first method, called PAWS, leverages labeled examples via a non-parametric pseudo-labeling scheme during contrastive pre-training. This enables pre-training a feature extractor that matches or surpasses the state-of-the art semi- and self-supervised approaches, while reducing the computational overhead by 4x–12x. The second method, Masked Siamese Networks (MSNs), describes a self-supervised pre-training strategy, where the task is to learn a feature extractor such that the latent representation of an image is invariant to masking portions of the image. Computationally, this strategy is especially scalable when used with Vision Transformer models, since only non-masked image patches are processed. As a result, MSNs improve the scalability of joint-embedding architectures while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 1% of the training set labels a model with MSN features achieves 75.7% top-1 accuracy. This is joint work with Mido Assran, Nicolas Ballas, Piotr Bojanowski, Florian Bordes, Mathilde Caron, Armand Joulin, Ishan Misra, and Pascal Vincent.

October 19, 2022

1:00 pm (1h)

Orchard View Room, Virtual

Mike Rabbat

Video