Timezone: »
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet [45]), along with local-window self-attention that performs self-attention over small non-overlapping image windows [21], for improving the memory and computation efficiency. In addition, we introduce a convolution into the FFN to exchange information across the disconnected image windows. We demonstrate the effectiveness of the HighResolution Transformer on both human pose estimation and semantic segmentation tasks, e.g., HRFormer outperforms Swin transformer [27] by 1.3 AP on COCO pose estimation with 50% fewer parameters and 30% fewer FLOPs. Code is available at: https://github.com/HRNet/HRFormer
Author Information
YUHUI YUAN (Microsoft Research)
Rao Fu (Brown University)
Lang Huang (Peking University)
I am currently a second-year Ph.D. student at the Department of Information & Communication Engineering, The University of Tokyo. Prior to that, I received a Master’s degree from the Department of Machine Intelligence, School of Electronics Engineering and Computer Science, Peking University in 2021. My research interests include self-supervised representation learning, robust learning from noisy data, and vision transformers.
Weihong Lin (Microsoft)
Chao Zhang (Peking University)
Xilin Chen (Institute of Computing Technology, Chinese Academy of Sciences)
Jingdong Wang (Microsoft Research,)
More from the Same Authors
-
2021 Spotlight: SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search »
Qi Chen · Bing Zhao · Haidong Wang · Mingqin Li · Chuanjie Liu · Zengzhong Li · Mao Yang · Jingdong Wang -
2022 Poster: Optimal Positive Generation via Latent Transformation for Contrastive Learning »
Hong Chang · Hong Chang · Bingpeng MA · Shiguang Shan · Xilin Chen -
2023 Poster: GlyphControl: Glyph Conditional Controllable Visual Text Generation »
Yukang Yang · Dongnan Gui · YUHUI YUAN -
2023 Poster: Rank-DETR for High Quality Object Detection »
Yifan Pu · Weicong Liang · Yiduo Hao · YUHUI YUAN · Yukang Yang · Chao Zhang · Han Hu · Gao Huang -
2023 Poster: Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes »
Minyang Hu · Hong Chang · Zong Guo · Bingpeng MA · Shiguang Shan · Xilin Chen -
2023 Poster: Glance and Focus: Memory Prompting for Multi-Event Video Question Answering »
Ziyi Bai · Ruiping Wang · Xilin Chen -
2023 Poster: Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation »
Jiachen Liang · RuiBing Hou · Hong Chang · Bingpeng MA · Shiguang Shan · Xilin Chen -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning »
Weicong Liang · YUHUI YUAN · Henghui Ding · Xiao Luo · Weihong Lin · Ding Jia · Zheng Zhang · Chao Zhang · Han Hu -
2022 Spotlight: Lightning Talks 3B-4 »
Guanghu Yuan · Yijing Liu · Li Yang · Yongri Piao · Zekang Zhang · Yaxin Xiao · Lin Chen · Hong Chang · Fajie Yuan · Guangyu Gao · Hong Chang · Qinxian Liu · Zhixiang Wei · Qingqing Ye · Chenyang Lu · Jian Meng · Haibo Hu · Xin Jin · Yudong Li · Miao Zhang · Zhiyuan Fang · Jae-sun Seo · Bingpeng MA · Jian-Wei Zhang · Shiguang Shan · Haozhe Feng · Huaian Chen · Deliang Fan · Huadi Zheng · Jianbo Jiao · Huchuan Lu · Beibei Kong · Miao Zheng · Chengfang Fang · Shujie Li · Zhongwei Wang · Yunchao Wei · Xilin Chen · Jie Shi · Kai Chen · Zihan Zhou · Lei Chen · Yi Jin · Wei Chen · Min Yang · Chenyun YU · Bo Hu · Zang Li · Yu Xu · Xiaohu Qie -
2022 Spotlight: Optimal Positive Generation via Latent Transformation for Contrastive Learning »
Hong Chang · Hong Chang · Bingpeng MA · Shiguang Shan · Xilin Chen -
2022 Poster: Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning »
Weicong Liang · YUHUI YUAN · Henghui Ding · Xiao Luo · Weihong Lin · Ding Jia · Zheng Zhang · Chao Zhang · Han Hu -
2022 Poster: Green Hierarchical Vision Transformer for Masked Image Modeling »
Lang Huang · Shan You · Mingkai Zheng · Fei Wang · Chen Qian · Toshihiko Yamasaki -
2021 Poster: SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search »
Qi Chen · Bing Zhao · Haidong Wang · Mingqin Li · Chuanjie Liu · Zengzhong Li · Mao Yang · Jingdong Wang -
2020 Poster: Self-Adaptive Training: beyond Empirical Risk Minimization »
Lang Huang · Chao Zhang · Hongyang Zhang -
2019 Poster: Cross Attention Network for Few-shot Classification »
Ruibing Hou · Hong Chang · Bingpeng MA · Shiguang Shan · Xilin Chen -
2019 Poster: Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition »
Xuesong Niu · Hu Han · Shiguang Shan · Xilin Chen -
2018 Poster: Weakly Supervised Dense Event Captioning in Videos »
Xin Wang · Wenbing Huang · Chuang Gan · Jingdong Wang · Wenwu Zhu · Junzhou Huang -
2014 Poster: Generalized Unsupervised Manifold Alignment »
Zhen Cui · Hong Chang · Shiguang Shan · Xilin Chen