Skip to yearly menu bar Skip to main content

Workshop: 4th Workshop on Self-Supervised Learning: Theory and Practice

HyperMAE: Modulating Implicit Neural Representations for MAE Training

Varun Belagali · Lei Zhou · Xiang Li · Dimitris Samaras


Implicit Neural Representations (INRs) have been applied successfully for reconstruction tasks in computer vision. However, to the best of our knowledge, using INRs for self-supervised visual recognition has not been explored. In this work, we propose HyperMAE, an INR version of the masked autoencoder (MAE). HyperMAE combines a transformer and a coordinate-MLP to form an efficient decoder architecture that maps the patch coordinates to all pixels in the patch, conditioned on the encoder outputs. The conditioning is implemented as the weight modulation of the coordinate-MLP, which is an INR of the image. Compared with the standard MAE, HyperMAE achieves comparable ImageNet-1k finetuning accuracy with only 72.9\% pretraining time using 56.5\% GPU memory and 46.5\% pretraining time using 88.6\% GPU memory. We hope our work could inspire further investigation on INRs for self-supervised learning. The code is available at

Chat is not available.