Timezone: »

How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
Qi Zhang · Yifei Wang · Yisen Wang

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #141

Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at https://github.com/zhangq327/U-MAE.

Author Information

Qi Zhang (Peking University)
Yifei Wang (Peking University)
Yisen Wang (Peking University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors