Composite Attention: A Framework for Combining Sequence Mixing Primitives
Jake Cunningham · Marc Deisenroth
Keywords:
Efficient Architectures
Abstract
Hybrid attention architectures have shown promising success in both equipping self attention with inductive bias for long-sequence modelling and reducing the computational burden of transformers without sacrificing quality. This paper introduces Composite Attention, a theoretical framework for analyzing the combination of sequence mixing primitives in modern deep learning architectures. Utilizing the definition of sequence mixers as structured linear maps, we formalize the composition of sequence mixing primitives as either sequential or recurrent composition.
Video
Chat is not available.
Successful Page Load