Timezone: »

Compressing Pre-trained Language Models using Progressive Low Rank Decomposition
Habib Hajimolahoseini · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Marzieh Tahaei · Omar Mohamed Awad · Yang Liu

In this paper, a progressive low rank decomposition method is used to compress large-scale pre-trained transformer based language models. To this end, each fully-connected layers of the transformer modules are decomposed into two consecutive smaller ones using a progressive Singular Value Decomposition technique. In contrast to many of state-of-the-art compression methods where intensive pre-training of the compressed model is necessary, progressive LRD can provide promising performance by compressing the model in the fine-tuning stage. Furthermore, the current state-of-the-art model compression techniques usually face a limitation in their compression ratio as the accuracy gap becomes significant with compression ratios higher than 2×. We show that in later steps of the iterative compression where the decomposed models becomes much smaller than their original (compression factors larger than 8×), Knowledge Distillation can also be used to improve the performance.

Author Information

Habib Hajimolahoseini (Huawei Toronto Research Centre)
Mehdi Rezaghoizadeh (Huawei Technologies)
Vahid Partovi Nia (Huawei Noah's Ark Lab)
Marzieh Tahaei (Huawei Noah's Ark Lab)
Omar Mohamed Awad (Huawei Technologies)
Yang Liu (Huawei Canada)

More from the Same Authors