Timezone: »
The attention mechanism is at the core of state-of-the-art Natural Language Processing (NLP) models, owing to its ability to focus on the most contextually relevant part of a sequence. However, current attention models rely on "flat-view" matrix methods to process sequence of tokens embedded in vector spaces, resulting in exceedingly high parameter complexity for practical applications. To this end, we introduce a novel Tensorized Spectral Attention (TSA) mechanism, which leverages on the Graph Tensor Network (GTN) framework to efficiently process tensorized token embeddings via attention based spectral graph filters. By virtue of multi-linear algebra, such tensorized token embeddings are shown to effectively bypass the Curse of Dimensionality, reducing the parameter complexity of the attention mechanism from exponential to linear in the weight matrix dimensions. Furthermore, the graph formulation of the attention domain enables the processing of tensorized embeddings through spectral graph convolution filters, which further increases its expressive power. The benefits of the TSA are demonstrated through five benchmark NLP experiments, where the proposed mechanism is shown to achieve better or comparable results against traditional attention models, while incurring drastically lower parameter complexity.
Author Information
Yao Lei Xu (Imperial College London)
Kriton Konstantinidis (Imperial College London)
Shengxi Li (Imperial College London)
Danilo Mandic (Imperial College London)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : A Tensorized Spectral Attention Mechanism for Efficient Natural Language Processing »
Dates n/a. Room None
More from the Same Authors
-
2021 : Bayesian Tensor Networks »
Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic -
2021 : Bayesian Tensor Networks »
Kriton Konstantinidis · Yao Lei Xu · Qibin Zhao · Danilo Mandic -
2021 : Danilo P. Mandic »
Danilo Mandic -
2021 : Multi-graph Tensor Networks: Big Data Analytics on Irregular Domains »
Danilo Mandic -
2020 : Poster 1: Multi-Graph Tensor Networks by Yao Lei Xu »
Yao Lei Xu -
2020 Poster: Reciprocal Adversarial Learning via Characteristic Functions »
Shengxi Li · Zeyang Yu · Min Xiang · Danilo Mandic -
2020 Spotlight: Reciprocal Adversarial Learning via Characteristic Functions »
Shengxi Li · Zeyang Yu · Min Xiang · Danilo Mandic -
2011 Poster: A Multilinear Subspace Regression Method Using Orthogonal Tensors Decompositions »
Qibin Zhao · Cesar F Caiafa · Danilo Mandic · Liqing Zhang · Tonio Ball · Andreas Schulze-bonhage · Andrzej S CICHOCKI -
2011 Spotlight: A Multilinear Subspace Regression Method Using Orthogonal Tensors Decompositions »
Qibin Zhao · Cesar F Caiafa · Danilo Mandic · Liqing Zhang · Tonio Ball · Andreas Schulze-bonhage · Andrzej S CICHOCKI