Timezone: »
Poster
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei · Yunchen Zhang · Xiangguo Zhang · Ruihao Gong · Shanghang Zhang · Qi Zhang · Fengwei Yu · Xianglong Liu
@
Transformer architecture has become the fundamental element of the widespread natural language processing~(NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices. Therefore, transformer quantization attracts wide research interest. Recent work recognizes that structured outliers are the critical bottleneck for quantization performance. However, their proposed methods increase the computation overhead and still leave the outliers there. To fundamentally address this problem, this paper delves into the inherent inducement and importance of the outliers. We discover that $\boldsymbol \gamma$ in LayerNorm (LN) acts as a sinful amplifier for the outliers, and the importance of outliers varies greatly where some outliers provided by a few tokens cover a large area but can be clipped sharply without negative impacts. Motivated by these findings, we propose an outlier suppression framework including two components: Gamma Migration and Token-Wise Clipping. The Gamma Migration migrates the outlier amplifier to subsequent modules in an equivalent transformation, contributing to a more quantization-friendly model without any extra burden. The Token-Wise Clipping takes advantage of the large variance of token range and designs a token-wise coarse-to-fine pipeline, obtaining a clipping range with minimal final quantization loss in an efficient way. This framework effectively suppresses the outliers and can be used in a plug-and-play mode. Extensive experiments prove that our framework surpasses the existing works and, for the first time, pushes the 6-bit post-training BERT quantization to the full-precision (FP) level. Our code is available at https://github.com/wimh966/outlier_suppression.
Author Information
Xiuying Wei (Beihang University)
Yunchen Zhang (UESTC)
Xiangguo Zhang (Beijing University of Aeronautics and Astronautics)
Ruihao Gong (Beihang University)
Shanghang Zhang (UC Berkeley)
Qi Zhang (Beihang University)
Fengwei Yu (Beihang University)
Xianglong Liu (Beihang University, Tsinghua University)
More from the Same Authors
-
2021 : MQBench: Towards Reproducible and Deployable Model Quantization Benchmark »
Yuhang Li · Mingzhu Shen · Jian Ma · Yan Ren · Mingxin Zhao · Qi Zhang · Ruihao Gong · Fengwei Yu · Junjie Yan -
2022 Poster: Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation »
Yixiong Zou · Shanghang Zhang · Yuhua Li · Ruixuan Li -
2023 Poster: BiMatting: Efficient Video Matting via Binarization »
Haotong Qin · Lei Ke · Xudong Ma · Martin Danelljan · Yu-Wing Tai · Chi-Keung Tang · Xianglong Liu · Fisher Yu -
2023 Poster: QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution »
Haotong Qin · Yulun Zhang · Yifu Ding · Yifan liu · Xianglong Liu · Martin Danelljan · Fisher Yu -
2023 Poster: PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection »
Qiang Zhou · Weize Li · Lihan Jiang · Guoliang Wang · Guyue Zhou · Shanghang Zhang · Hao Zhao -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models »
Xiuying Wei · Yunchen Zhang · Xiangguo Zhang · Ruihao Gong · Shanghang Zhang · Qi Zhang · Fengwei Yu · Xianglong Liu -
2022 Workshop: Human in the Loop Learning (HiLL) Workshop at NeurIPS 2022 »
Shanghang Zhang · Hao Dong · Wei Pan · Pradeep Ravikumar · Vittorio Ferrari · Fisher Yu · Xin Wang · Zihan Ding -
2022 Poster: Jump Self-attention: Capturing High-order Statistics in Transformers »
Haoyi Zhou · Siyang Xiao · Shanghang Zhang · Jieqi Peng · Shuai Zhang · Jianxin Li -
2020 Workshop: Self-Supervised Learning -- Theory and Practice »
Pengtao Xie · Shanghang Zhang · Pulkit Agrawal · Ishan Misra · Cynthia Rudin · Abdelrahman Mohamed · Wenzhen Yuan · Barret Zoph · Laurens van der Maaten · Xingyi Yang · Eric Xing