Timezone: »
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks. Comprehensive experiments under various efficiency constraints demonstrate that our proposed dynamic BERT (or RoBERTa) at its largest size has comparable performance as BERT-base (or RoBERTa-base), while at smaller widths and depths consistently outperforms existing BERT compression methods. Code is available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT.
Author Information
Lu Hou (Huawei Technologies Co., Ltd)
Zhiqi Huang (Peking University)
Lifeng Shang (Huawei Noah's Ark Lab)
Xin Jiang (Huawei Noah's Ark Lab)
Xiao Chen (Huawei Noah's Ark Lab)
Qun Liu (Huawei Noah's Ark Lab)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: DynaBERT: Dynamic BERT with Adaptive Width and Depth »
Tue. Dec 8th 04:00 -- 04:10 AM Room Orals & Spotlights: Language/Audio Applications
More from the Same Authors
-
2022 Poster: TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models »
Huibin Ge · Xiaohu Zhao · Chuang Liu · Yulong Zeng · Qun Liu · Deyi Xiong -
2022 Poster: M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus »
Lichao Zhang · Ruiqi Li · Shoutong Wang · Liqun Deng · Jinglin Liu · Yi Ren · Jinzheng He · Rongjie Huang · Jieming Zhu · Xiao Chen · Zhou Zhao -
2023 Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants »
Mehdi Rezagholizadeh · Peyman Passban · Yue Dong · Yu Cheng · Soheila Samiee · Lili Mou · Qun Liu · Boxing Chen -
2023 Poster: Reusing Pretrained Models by Multi-linear Operators for Efficient Training »
Yu Pan · Ye Yuan · Yichun Yin · Zenglin Xu · Lifeng Shang · Xin Jiang · Qun Liu -
2023 Poster: FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation »
Yuanxin Liu · Lei Li · Shuhuai Ren · Rundong Gao · Shicheng Li · Sishuo Chen · Xu Sun · Lu Hou -
2022 Spotlight: TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models »
Huibin Ge · Xiaohu Zhao · Chuang Liu · Yulong Zeng · Qun Liu · Deyi Xiong -
2022 Spotlight: M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus »
Lichao Zhang · Ruiqi Li · Shoutong Wang · Liqun Deng · Jinglin Liu · Yi Ren · Jinzheng He · Rongjie Huang · Jieming Zhu · Xiao Chen · Zhou Zhao -
2022 : Fine-grained Interactive Vision Language Pre-training »
Lu Hou · Lu Hou -
2022 Workshop: Second Workshop on Efficient Natural Language and Speech Processing (ENLSP-II) »
Mehdi Rezagholizadeh · Peyman Passban · Yue Dong · Lili Mou · Pascal Poupart · Ali Ghodsi · Qun Liu -
2022 Poster: Towards Efficient Post-training Quantization of Pre-trained Language Models »
Haoli Bai · Lu Hou · Lifeng Shang · Xin Jiang · Irwin King · Michael R Lyu -
2022 Poster: Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark »
Jiaxi Gu · Xiaojun Meng · Guansong Lu · Lu Hou · Niu Minzhe · Xiaodan Liang · Lewei Yao · Runhui Huang · Wei Zhang · Xin Jiang · Chunjing XU · Hang Xu -
2021 : Panel Discussion »
Pascal Poupart · Ali Ghodsi · Luke Zettlemoyer · Sameer Singh · Kevin Duh · Yejin Choi · Lu Hou -
2021 : Compression and Acceleration of Pre-trained Language Models »
Lu Hou -
2021 Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference) »
Mehdi Rezaghoizadeh · Lili Mou · Yue Dong · Pascal Poupart · Ali Ghodsi · Qun Liu -
2020 Poster: Unsupervised Text Generation by Learning from Search »
Jingjing Li · Zichao Li · Lili Mou · Xin Jiang · Michael R Lyu · Irwin King -
2019 Poster: Normalization Helps Training of Quantized LSTM »
Lu Hou · Jinhua Zhu · James Kwok · Fei Gao · Tao Qin · Tie-Yan Liu