I will present our recent and ongoing work on fully automatic image colorization. Our approach exploits both low-level and semantic representations during colorization. As many scene elements naturally appear according to multimodal color distributions, we train our model to predict per-pixel color histograms. Our system achieves state-of-the-art results under a variety of metrics. Moreover, it provides a vehicle to explore the role the colorization task can play as a proxy for visual understanding, providing a self-supervision mechanism for learning representations. I will describe the ability of our self-supervised network in several contexts, such as classification and semantic segmentation. On VOC segmentation and classification tasks, we present results that are state-of-the-art among methods not using massive supervised pre-training.
Joint work with Gustav Larsson and Michael Maire.
Discovery Building, Orchard View Room