Remote sensing data plays an important role in monitoring global-scale challenges. To achieve automated analysis of it, learning useful features from the vast amount of unlabeled data is the key. Based on the unique characteristics of RS data - multispectrum, large resolution, dense object and complex background, we propose a multispectrum masked autoencoder framework to learn RS representation in a self-supervised way and verify its performance by transfer learning to a sense classification task, which achieves the best top-1 accuracy.