Timezone: »
Poster
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai · Lu Hou · Lifeng Shang · Xin Jiang · Irwin King · Michael R Lyu
Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end training with full access to the entire dataset. Therefore, they suffer from slow training, large memory overhead, and data accessibility issues. In this paper, we study post-training quantization~(PTQ) of PLMs, and propose module-wise quantization error minimization~(MREM), an efficient solution to mitigate these issues. By partitioning the PLM into multiple modules, we minimize the reconstruction error incurred by quantization for each module. In addition, we design a new model parallel training strategy such that each module can be trained locally on separate computing devices without waiting for preceding modules, which brings nearly the theoretical training speed-up (e.g., $4\times$ on $4$ GPUs). Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
Author Information
Haoli Bai (Huawei Noah's Ark Lab)
Lu Hou (Huawei Technologies Co., Ltd)
Lifeng Shang (Huawei Noah's Ark Lab)
Xin Jiang (Noah’s Ark Lab, Huawei Technologies)
Irwin King (Chinese University of Hong Kong)
Michael R Lyu (CUHK)
More from the Same Authors
-
2021 : Score-based Graph Generative Model for Neutrino Events Classification and Reconstruction »
Yiming Sun · Zixing Song · Irwin King -
2022 : Individual Fairness in Dynamic Financial Networks »
Zixing Song · Yueen Ma · Irwin King -
2022 : Fine-grained Interactive Vision Language Pre-training »
Lu Hou · Lu Hou -
2022 Poster: Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark »
Jiaxi Gu · Xiaojun Meng · Guansong Lu · Lu Hou · Niu Minzhe · Xiaodan Liang · Lewei Yao · Runhui Huang · Wei Zhang · Xin Jiang · Chunjing XU · Hang Xu -
2021 : Panel Discussion »
Pascal Poupart · Ali Ghodsi · Luke Zettlemoyer · Sameer Singh · Kevin Duh · Yejin Choi · Lu Hou -
2021 : Compression and Acceleration of Pre-trained Language Models »
Lu Hou -
2020 Poster: Revisiting Parameter Sharing for Automatic Neural Channel Number Search »
Jiaxing Wang · Haoli Bai · Jiaxiang Wu · Xupeng Shi · Junzhou Huang · Irwin King · Michael R Lyu · Jian Cheng -
2020 Poster: Unsupervised Text Generation by Learning from Search »
Jingjing Li · Zichao Li · Lili Mou · Xin Jiang · Michael R Lyu · Irwin King -
2020 Poster: DynaBERT: Dynamic BERT with Adaptive Width and Depth »
Lu Hou · Zhiqi Huang · Lifeng Shang · Xin Jiang · Xiao Chen · Qun Liu -
2020 Spotlight: DynaBERT: Dynamic BERT with Adaptive Width and Depth »
Lu Hou · Zhiqi Huang · Lifeng Shang · Xin Jiang · Xiao Chen · Qun Liu -
2019 Poster: Normalization Helps Training of Quantized LSTM »
Lu Hou · Jinhua Zhu · James Kwok · Fei Gao · Tao Qin · Tie-Yan Liu -
2018 Poster: Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs »
Han Shao · Xiaotian Yu · Irwin King · Michael R Lyu -
2018 Spotlight: Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs »
Han Shao · Xiaotian Yu · Irwin King · Michael R Lyu -
2014 Poster: Combinatorial Pure Exploration of Multi-Armed Bandits »
Shouyuan Chen · Tian Lin · Irwin King · Michael R Lyu · Wei Chen -
2014 Oral: Combinatorial Pure Exploration of Multi-Armed Bandits »
Shouyuan Chen · Tian Lin · Irwin King · Michael R Lyu · Wei Chen -
2013 Poster: Exact and Stable Recovery of Pairwise Interaction Tensors »
Shouyuan Chen · Michael R Lyu · Irwin King · Zenglin Xu -
2013 Spotlight: Exact and Stable Recovery of Pairwise Interaction Tensors »
Shouyuan Chen · Michael R Lyu · Irwin King · Zenglin Xu -
2010 Workshop: Machine Learning for Social Computing »
Zenglin Xu · Irwin King · Shenghuo Zhu · Yuan Qi · Rong Yan · John Yen -
2009 Poster: Adaptive Regularization for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu · Zhirong Yang -
2009 Spotlight: Adaptive Regularization for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu · Zhirong Yang -
2009 Poster: Heavy-Tailed Symmetric Stochastic Neighbor Embedding »
Zhirong Yang · Irwin King · Zenglin Xu · Erkki Oja -
2009 Spotlight: Heavy-Tailed Symmetric Stochastic Neighbor Embedding »
Zhirong Yang · Irwin King · Zenglin Xu · Erkki Oja -
2008 Poster: Learning with Consistency between Inductive Functions and Kernels »
Haixuan Yang · Irwin King · Michael R Lyu -
2008 Spotlight: Learning with Consistency between Inductive Functions and Kernels »
Haixuan Yang · Irwin King · Michael R Lyu -
2008 Poster: An Extended Level Method for Efficient Multiple Kernel Learning »
Zenglin Xu · Rong Jin · Irwin King · Michael R Lyu -
2007 Poster: Efficient Convex Relaxation for Transductive Support Vector Machine »
Zenglin Xu · Rong Jin · Jianke Zhu · Irwin King · Michael R Lyu