Timezone: »

Self-supervised Learning is More Robust to Dataset Imbalance
Hong Liu · Jeff Z. HaoChen · Adrien Gaidon · Tengyu Ma
Event URL: https://openreview.net/forum?id=vUz4JPRLpGx »

Self-supervised learning (SSL) learns general visual representations without the need of labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. We investigate SSL under dataset imbalance, and find out that existing self-supervised representations are more robust to class imbalance than supervised representations.The performance gap between balanced and imbalanced pre-training with SSL is much smaller than the gap with supervised learning.Second, to understand the robustness of SSL, we hypothesize that SSL learns richer features from frequent data: it may learn label-irrelevant-but-transferable features that help classify the rare classes. In contrast, supervised learning has no incentive to learn features irrelevant to the labels of frequent examples. We validate the hypothesis with semi-synthetic experiments and theoretical analysis on a simplified setting.

Author Information

Hong Liu (Stanford University)
Jeff Z. HaoChen (Stanford University)
Adrien Gaidon (Toyota Research Institute)
Tengyu Ma (Stanford University)

More from the Same Authors