Timezone: »

 
Strategies for Applying Low Rank Decomposition to Transformer-Based Models
Habib Hajimolahoseini · Walid Ahmed · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Yang Liu

Low rank decomposition decomposes each fully-connected layer of the transformer modules into two smaller layers using Singular Value Decomposition. The state-of-the-art techniques usually apply LRD in a single-shot, where all of thelayers are decomposed simultaneously. In this paper, we propose and compare different strategies for applying low rank decomposition to compress pre-trained transformer based models. These strategies include: layer-by-layer and progressive decomposition. We observe that progressive low rank decomposition, in which the rank is decreased incrementally results in a higher accuracy after decomposition comparing to single-shot and layer-by-layer low rank decomposition. Furthermore, in contrast with many of state-of-the-art compression methods where intensive pre-training of the compressed model is necessary, we show that progressive LRD can provide promising performance by compressing the model in the fine-tuning stage.

Author Information

Habib Hajimolahoseini (Huawei Toronto Research Centre)
Walid Ahmed (Huawei)
Mehdi Rezaghoizadeh (Huawei Technologies)
Vahid Partovi Nia (Huawei Noah's Ark Lab)
Yang Liu (Huawei Canada)

More from the Same Authors