Timezone: »
Human spatial attention conveys information about the regions of scenes that are important for performing visual tasks. Prior work has shown that the spatial distribution of human attention can be leveraged to benefit various supervised vision tasks. Might providing this weak form of supervision be useful for self-supervised representation learning? One reason why this question has not been previously addressed is that self-supervised models require large datasets, and no large dataset exists with ground-truth human attentional labels. We therefore construct an auxiliary teacher model to predict human attention, trained on a relatively small labeled dataset. This human-attention model allows us to provide an image (pseudo) attention labels for ImageNet. We then train a model with a primary contrastive objective; to this standard configuration, we add a simple output head trained to predict the attentional map for each image. We measured the quality of learned representations by evaluating classification performance from the frozen learned embeddings. We find that our approach improves accuracy of the contrastive models on ImageNet and its attentional map readout aligns better with human attention compared to vanilla contrastive learning models.
Author Information
Yushi Yao
Chang Ye (Google)
Junfeng He (Google)
Gamaleldin Elsayed (Google Research, Brain Team)
More from the Same Authors
-
2022 : Neural Network Online Training with Sensitivity to Multiscale Temporal Structure »
Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer -
2022 : Spatial Symmetry in Slot Attention »
Ondrej Biza · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gamaleldin Elsayed · Aravindh Mahendran · Thomas Kipf -
2022 Poster: SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos »
Gamaleldin Elsayed · Aravindh Mahendran · Sjoerd van Steenkiste · Klaus Greff · Michael Mozer · Thomas Kipf -
2019 Poster: Saccader: Improving Accuracy of Hard Attention Models for Vision »
Gamaleldin Elsayed · Simon Kornblith · Quoc V Le -
2018 Poster: Large Margin Deep Networks for Classification »
Gamaleldin Elsayed · Dilip Krishnan · Hossein Mobahi · Kevin Regan · Samy Bengio -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein