Skip to yearly menu bar Skip to main content


Poster

The Benefits of Balance: From Information Projections to Variance Reduction

Lang Liu · Ronak Mehta · Soumik Pal · Zaid Harchaoui

[ ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Data balancing across multiple modalities/sources appears in various forms in several foundation models (e.g., CLIP and DINO) achieving universal representation learning. We show that this iterative algorithm, usually used to avoid representation collapse, enjoys an unsuspected benefit: reducing the variance of empirical functionals of the distribution over these sources. We provide non-asymptotic bounds quantifying this variance reduction effect and relate them to the eigendecays of appropriately defined Markov operators. We explain how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be interpreted as instances of this variance reduction scheme.

Live content is unavailable. Log in and register to view live content