Timezone: »

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
Wouter Van Gansbeke · Simon Vandenhende · Stamatios Georgoulis · Luc V Gool

Thu Dec 09 08:30 AM -- 10:00 AM (PST) @

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. In this paper, we first study how biases in the dataset affect existing methods. Our results show that an approach like MoCo works surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains with minor modifications. We show that learning additional invariances - through the use of multi-scale cropping, stronger augmentations and nearest neighbors - improves the representations. Finally, we observe that MoCo learns spatially structured representations when trained with a multi-crop strategy. The representations can be used for semantic segment retrieval and video instance segmentation without finetuning. Moreover, the results are on par with specialized models. We hope this work will serve as a useful study for other researchers.

Author Information

Wouter Van Gansbeke (KU Leuven)
Simon Vandenhende (KU Leuven)
Stamatios Georgoulis (ETH Zurich)

Stamatios Georgoulis is currently a post-doctoral researcher at the CVL group of ETH Zurich, working with Prof. Luc Van Gool and Dr. Dengxin Dai on the R&D project "TRACE: Toyota Research on Automated Cars in Europe". His research interests are in the area of autonomous driving, including multi-task learning, image synthesis/decomposition and semantic segmentation.

Luc V Gool (Computer Vision Lab, ETH Zurich)

More from the Same Authors