Timezone: »

Untangling tradeoffs between recurrence and self-attention in artificial neural networks
Giancarlo Kerg · Bhargav Kanuparthi · Anirudh Goyal ALIAS PARTH GOYAL · Kyle Goyette · Yoshua Bengio · Guillaume Lajoie

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1417

Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimization and computation, and rely on considerable memory and computational resources that scale poorly. In this work, we present a formal analysis of how self-attention affects gradient propagation in recurrent networks, and prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies by establishing concrete bounds for gradient norms. Building on these results, we propose a relevancy screening mechanism, inspired by the cognitive process of memory consolidation, that allows for a scalable use of sparse self-attention with recurrence. While providing guarantees to avoid vanishing gradients, we use simple numerical experiments to demonstrate the tradeoffs in performance and computational resources by efficiently balancing attention and recurrence. Based on our results, we propose a concrete direction of research to improve scalability of attentive networks.

Author Information

Giancarlo Kerg (MILA)
bhargav104 Kanuparthi (Montreal Institute for Learning Algorithms)
Anirudh Goyal ALIAS PARTH GOYAL (Université de Montréal)
Kyle Goyette (University of Montreal)
Yoshua Bengio (Mila / U. Montreal)

Yoshua Bengio is Full Professor in the computer science and operations research department at U. Montreal, scientific director and founder of Mila and of IVADO, Turing Award 2018 recipient, Canada Research Chair in Statistical Learning Algorithms, as well as a Canada AI CIFAR Chair. He pioneered deep learning and has been getting the most citations per day in 2018 among all computer scientists, worldwide. He is an officer of the Order of Canada, member of the Royal Society of Canada, was awarded the Killam Prize, the Marie-Victorin Prize and the Radio-Canada Scientist of the year in 2017, and he is a member of the NeurIPS advisory board and co-founder of the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains. His goal is to contribute to uncover the principles giving rise to intelligence through learning, as well as favour the development of AI for the benefit of all.

Guillaume Lajoie (Mila, Université de Montréal)

More from the Same Authors