Timezone: »
Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance.
Author Information
Gang Li (Institute of Software, Chinese Academy of Sciences)
Heliang Zheng (USTC)
Daqing Liu (JD.com Inc.)
Chaoyue Wang (JD Explore Academy)
Bing Su (Renmin University of China)
Changwen Zheng (Institute of Software, Chinese Academy of Sciences)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders »
Thu. Dec 1st through Fri the 2nd Room Hall J #212
More from the Same Authors
-
2022 Poster: Log-Polar Space Convolution Layers »
Bing Su · Ji-Rong Wen -
2022 Poster: MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning »
Jiangmeng Li · Wenwen Qiang · Yanan Zhang · Wenyi Mo · Changwen Zheng · Bing Su · Hui Xiong -
2023 Poster: All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation »
Liyao Tang · Zhe Chen · Shanshan Zhao · Chaoyue Wang · Dacheng Tao -
2023 Poster: Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation »
Minghui Hu · Jianbin Zheng · Daqing Liu · Chuanxia Zheng · Chaoyue Wang · Dacheng Tao · Tat-Jen Cham -
2023 Poster: Domain Re-Modulation for Few-Shot Generative Domain Adaptation »
Yi Wu · Ziqiang Li · Chaoyue Wang · Heliang Zheng · Shanshan Zhao · Bin Li · Dacheng Tao -
2022 Spotlight: Lightning Talks 2B-3 »
Jie-Jing Shao · Jiangmeng Li · Jiashuo Liu · Zongbo Han · Tianyang Hu · Jiayun Wu · Wenwen Qiang · Jun WANG · Zhipeng Liang · Lan-Zhe Guo · Wenjia Wang · Yanan Zhang · Xiao-wen Yang · Fan Yang · Bo Li · Wenyi Mo · Zhenguo Li · Liu Liu · Peng Cui · Yu-Feng Li · Changwen Zheng · Lanqing Li · Yatao Bian · Bing Su · Hui Xiong · Peilin Zhao · Bingzhe Wu · Changqing Zhang · Jianhua Yao -
2022 Spotlight: Lightning Talks 2B-2 »
Chenjian Gao · Rui Ding · Lingzhi LI · Fan Yang · Xingting Yao · Jianxin Li · Bing Su · Zhen Shen · Tongda Xu · Shuai Zhang · Ji-Rong Wen · Lin Guo · Fanrong Li · Kehua Guo · Zhongshu Wang · Zhi Chen · Xiangyuan Zhu · Zitao Mo · Dailan He · Hui Xiong · Yan Wang · Zheng Wu · Wenbing Tao · Jian Cheng · Haoyi Zhou · Li Shen · Ping Tan · Liwei Wang · Hongwei Qin -
2022 Spotlight: Log-Polar Space Convolution Layers »
Bing Su · Ji-Rong Wen -
2022 Spotlight: MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning »
Jiangmeng Li · Wenwen Qiang · Yanan Zhang · Wenyi Mo · Changwen Zheng · Bing Su · Hui Xiong