Timezone: »
Low rank decomposition decomposes each fully-connected layer of the transformer modules into two smaller layers using Singular Value Decomposition. The state-of-the-art techniques usually apply LRD in a single-shot, where all of thelayers are decomposed simultaneously. In this paper, we propose and compare different strategies for applying low rank decomposition to compress pre-trained transformer based models. These strategies include: layer-by-layer and progressive decomposition. We observe that progressive low rank decomposition, in which the rank is decreased incrementally results in a higher accuracy after decomposition comparing to single-shot and layer-by-layer low rank decomposition. Furthermore, in contrast with many of state-of-the-art compression methods where intensive pre-training of the compressed model is necessary, we show that progressive LRD can provide promising performance by compressing the model in the fine-tuning stage.
Author Information
Habib Hajimolahoseini (Huawei Toronto Research Centre)
Walid Ahmed (Huawei)
Mehdi Rezaghoizadeh (Huawei Technologies)
Vahid Partovi Nia (Huawei Noah's Ark Lab)
Yang Liu (Huawei Canada)
More from the Same Authors
-
2021 : A Short Study on Compressing Decoder-Based Language Models »
Tianda Li · Yassir El Mesbahi · Ivan Kobyzev · Ahmad Rashid · Atif Mahmud · Nithin Anchuri · Habib Hajimolahoseini · Yang Liu · Mehdi Rezagholizadeh -
2021 : Compressing Pre-trained Language Models using Progressive Low Rank Decomposition »
Habib Hajimolahoseini · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Marzieh Tahaei · Omar Mohamed Awad · Yang Liu -
2021 : Kronecker Decomposition for GPT Compression »
Ali Edalati · Marzieh Tahaei · Ahmad Rashid · Vahid Partovi Nia · James J. Clark · Mehdi Rezaghoizadeh -
2022 : DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low Rank Adaptation »
Mojtaba Valipour · Mehdi Rezaghoizadeh · Ivan Kobyzev · Ali Ghodsi -
2022 : Improved Knowledge Distillation by Utilizing Backward Pass Knowledge in Neural Networks »
Aref Jafari · Mehdi Rezaghoizadeh · Ali Ghodsi -
2022 : Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement »
Heitor Guimarães · Arthur Pimentel · Anderson R. Avila · Mehdi Rezaghoizadeh · Tiago H Falk -
2022 : Attribute Controlled Dialogue Prompting »
Runcheng Liu · Ahmad Rashid · Ivan Kobyzev · Mehdi Rezaghoizadeh · Pascal Poupart -
2022 : Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement »
Heitor Guimarães · Arthur Pimentel · Anderson R. Avila · Mehdi Rezaghoizadeh · Tiago H Falk -
2022 : Attribute Controlled Dialogue Prompting »
Runcheng Liu · Ahmad Rashid · Ivan Kobyzev · Mehdi Rezaghoizadeh · Pascal Poupart -
2022 Poster: Is Integer Arithmetic Enough for Deep Learning Training? »
Alireza Ghaffari · Marzieh S. Tahaei · Mohammadreza Tayaranian · Masoud Asgharian · Vahid Partovi Nia -
2021 Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference) »
Mehdi Rezaghoizadeh · Lili Mou · Yue Dong · Pascal Poupart · Ali Ghodsi · Qun Liu -
2021 Poster: Demystifying and Generalizing BinaryConnect »
Tim Dockhorn · Yaoliang Yu · Eyyüb Sari · Mahdi Zolnouri · Vahid Partovi Nia -
2021 Poster: S$^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks »
Xinlin Li · Bang Liu · Yaoliang Yu · Wulong Liu · Chunjing XU · Vahid Partovi Nia