Transformer-based architectures gained popularity dueto their exceptional performances in the natural lan-guage processing domain. Then gradually, we have seenwidespread use of Transformer based architectures inthe other domains like vision and time series. However,a renowned bottleneck of the transformers when usinglong sequences is that they use the self-attention mech-anism, and computing the self-attention is very costlyfor such long sequences. Therefore, the performanceof the Transformers is greatly affected when dealingwith such long sequences and we know that most ofthe real-world time series data contain long sequences.To overcome this problem, various approaches havebeen adopted. Among them, various modifications ofthe vanilla Transformer and sparse attention techniquesare worth mentioning. To solve this problem, we pro-pose a novel attention mechanism inspired by the Fac-torization Machine. In this paper, we show that insteadof computing the exact attention values, we can learn afunction that computes the approximate attention for thelong sequences and thus, predicts long time series se-quences faster. In particular, we aims to develop a novelattention mechanism that takes advantage of the exist-ing attention mechanism in Transformers and makesthem more efficient by learning approximate attentionwithout affecting the performance of the Transformersmuch.