Timezone: »
Poster
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao · Wenliang Zhao · Yansong Tang · Jie Zhou · Ser Nam Lim · Jiwen Lu
Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution ($\textit{g}^\textit{n}$Conv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. $\textit{g}^\textit{n}$Conv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and larger model sizes. Apart from the effectiveness in visual encoders, we also show $\textit{g}^\textit{n}$Conv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that $\textit{g}^\textit{n}$Conv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet.
Author Information
Yongming Rao (Tsinghua University)
Wenliang Zhao (Automation, Tsinghua University, Tsinghua University)
Yansong Tang (University of Oxford)
Jie Zhou (Tsinghua University)
Ser Nam Lim (Facebook AI)
Jiwen Lu (Tsinghua University)
More from the Same Authors
-
2021 : Mix-MaxEnt: Improving Accuracy and Uncertainty Estimates of Deterministic Neural Networks »
Francesco Pinto · Harry Yang · Ser Nam Lim · Philip Torr · Puneet Dokania -
2022 Poster: OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression »
Wanhua Li · Xiaoke Huang · Zheng Zhu · Yansong Tang · Xiu Li · Jie Zhou · Jiwen Lu -
2022 Poster: P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting »
Ziyi Wang · Xumin Yu · Yongming Rao · Jie Zhou · Jiwen Lu -
2023 Poster: UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models »
Wenliang Zhao · Lujia Bai · Yongming Rao · Jie Zhou · Jiwen Lu -
2023 Poster: Riemannian Residual Neural Networks »
Isay Katsman · Eric M Chen · Sidhanth Holalkere · Anna Asch · Aaron Lou · Ser Nam Lim · Christopher De Sa -
2023 Poster: VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks »
Wenhai Wang · Zhe Chen · Xiaokang Chen · Jiannan Wu · Xizhou Zhu · Gang Zeng · Ping Luo · Tong Lu · Jie Zhou · Yu Qiao · Jifeng Dai -
2023 Poster: Test-Time Distribution Normalization for Contrastively Learned Visual-language Models »
Yifei Zhou · Juntao Ren · Fengyu Li · Ramin Zabih · Ser Nam Lim -
2023 Poster: MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory »
Yinan Liang · Ziwei Wang · Xiuwei Xu · Yansong Tang · Jie Zhou · Jiwen Lu -
2023 Poster: SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation »
Zhuoyan Luo · Yicheng Xiao · Yong Liu · Shuyan Li · Yitong Wang · Yansong Tang · Xiu Li · Yujiu Yang -
2023 Poster: Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements »
Gaurav Shrivastava · Ser Nam Lim · Abhinav Shrivastava -
2022 Spotlight: P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting »
Ziyi Wang · Xumin Yu · Yongming Rao · Jie Zhou · Jiwen Lu -
2022 Spotlight: Lightning Talks 6A-1 »
Ziyi Wang · Nian Liu · Yaming Yang · Qilong Wang · Yuanxin Liu · Zongxin Yang · Yizhao Gao · Yanchen Deng · Dongze Lian · Nanyi Fei · Ziyu Guan · Xiao Wang · Shufeng Kong · Xumin Yu · Daquan Zhou · Yi Yang · Fandong Meng · Mingze Gao · Caihua Liu · Yongming Rao · Zheng Lin · Haoyu Lu · Zhe Wang · Jiashi Feng · Zhaolin Zhang · Deyu Bo · Xinchao Wang · Chuan Shi · Jiangnan Li · Jiangtao Xie · Jie Zhou · Zhiwu Lu · Wei Zhao · Bo An · Jiwen Lu · Peihua Li · Jian Pei · Hao Jiang · Cai Xu · Peng Fu · Qinghua Hu · Yijie Li · Weigang Lu · Yanan Cao · Jianbin Huang · Weiping Wang · Zhao Cao · Jie Zhou -
2022 Poster: Using Mixup as a Regularizer Can Surprisingly Improve Accuracy & Out-of-Distribution Robustness »
Francesco Pinto · Harry Yang · Ser Nam Lim · Philip Torr · Puneet Dokania -
2022 Poster: Spartan: Differentiable Sparsity via Regularized Transportation »
Kai Sheng Tai · Taipeng Tian · Ser Nam Lim -
2022 Poster: FedSR: A Simple and Effective Domain Generalization Method for Federated Learning »
A. Tuan Nguyen · Philip Torr · Ser Nam Lim -
2022 Poster: GAPX: Generalized Autoregressive Paraphrase-Identification X »
Yifei Zhou · Renyu Li · Hayden Housen · Ser Nam Lim -
2022 Poster: Few-Shot Fast-Adaptive Anomaly Detection »
Ze Wang · Yipin Zhou · Rui Wang · Tsung-Yu Lin · Ashish Shah · Ser Nam Lim -
2021 Poster: Learning to Ground Multi-Agent Communication with Autoencoders »
Toru Lin · Jacob Huh · Christopher Stauffer · Ser Nam Lim · Phillip Isola -
2021 Poster: Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods »
Derek Lim · Felix Hohne · Xiuyu Li · Sijia Linda Huang · Vaishnavi Gupta · Omkar Bhalerao · Ser Nam Lim -
2021 Poster: NeRV: Neural Representations for Videos »
Hao Chen · Bo He · Hanyu Wang · Yixuan Ren · Ser Nam Lim · Abhinav Shrivastava -
2021 Poster: DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification »
Yongming Rao · Wenliang Zhao · Benlin Liu · Jiwen Lu · Jie Zhou · Cho-Jui Hsieh -
2021 Poster: Global Filter Networks for Image Classification »
Yongming Rao · Wenliang Zhao · Zheng Zhu · Jiwen Lu · Jie Zhou -
2021 Poster: Equivariant Manifold Flows »
Isay Katsman · Aaron Lou · Derek Lim · Qingxuan Jiang · Ser Nam Lim · Christopher De Sa -
2021 Poster: A Continuous Mapping For Augmentation Design »
Keyu Tian · Chen Lin · Ser Nam Lim · Wanli Ouyang · Puneet Dokania · Philip Torr -
2020 Poster: Better Set Representations For Relational Reasoning »
Qian Huang · Horace He · Abhay Singh · Yan Zhang · Ser Nam Lim · Austin Benson -
2020 Poster: Neural Manifold Ordinary Differential Equations »
Aaron Lou · Derek Lim · Isay Katsman · Leo Huang · Qingxuan Jiang · Ser Nam Lim · Christopher De Sa -
2017 Poster: Runtime Neural Pruning »
Ji Lin · Yongming Rao · Jiwen Lu · Jie Zhou