Poster
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee · Soo Min Kwon · Qing Qu · Hun-Seok Kim
East Exhibit Hall A-C #2007
Large-scale foundation models have demonstrated exceptional performance in language and vision tasks. However, the numerous dense matrix-vector operations involved in these large deep neural networks pose significant challenges for inference. To address these computational challenges, we introduce the Block-Level Adaptive STructured (BLAST) matrix, which aims to learn, identify, and exploit efficient structures prevalent in the weight matrices of deep learning models. The BLAST matrix is designed as a unique factorization technique to model the weights, employing a substantially reduced intrinsic dimension with fewer parameters, enabling lower complexity matrix multiplications. The components of the BLAST matrix can either be learned from data or estimated using an existing weight matrix via a preconditioned gradient descent method. We demonstrate that the BLAST matrices are applicable to any linear layer and can be employed during various stages of model deployment, including pre-training, fine-tuning, and post-training compression. Overall, our experimental results validate the efficiency of the BLAST matrix by exhibiting either minimal accuracy degradation or an increase in performance, both in language and vision tasks.
Live content is unavailable. Log in and register to view live content