Timezone: »
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
Author Information
Ashish Vaswani (Google Brain)
Noam Shazeer (Google)
Niki Parmar (Google)
Jakob Uszkoreit (Google, Inc.)
Llion Jones (Google)
Aidan Gomez (University of Toronto)
Łukasz Kaiser (Google Brain)
Illia Polosukhin
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Spotlight: Attention is All you Need »
Wed Dec 6th 11:35 -- 11:40 PM Room Hall A
More from the Same Authors
-
2020 Poster: Object-Centric Learning with Slot Attention »
Francesco Locatello · Dirk Weissenborn · Thomas Unterthiner · Aravindh Mahendran · Georg Heigold · Jakob Uszkoreit · Alexey Dosovitskiy · Thomas Kipf -
2020 Spotlight: Object-Centric Learning with Slot Attention »
Francesco Locatello · Dirk Weissenborn · Thomas Unterthiner · Aravindh Mahendran · Georg Heigold · Jakob Uszkoreit · Alexey Dosovitskiy · Thomas Kipf -
2019 Poster: Stand-Alone Self-Attention in Vision Models »
Niki Parmar · Prajit Ramachandran · Ashish Vaswani · Irwan Bello · Anselm Levskaya · Jon Shlens -
2018 Poster: Blockwise Parallel Decoding for Deep Autoregressive Models »
Mitchell Stern · Noam Shazeer · Jakob Uszkoreit -
2018 Poster: Mesh-TensorFlow: Deep Learning for Supercomputers »
Noam Shazeer · Youlong Cheng · Niki Parmar · Dustin Tran · Ashish Vaswani · Penporn Koanantakool · Peter Hawkins · HyoukJoong Lee · Mingsheng Hong · Cliff Young · Ryan Sepassi · Blake Hechtman -
2017 Poster: The Reversible Residual Network: Backpropagation Without Storing Activations »
Aidan Gomez · Mengye Ren · Raquel Urtasun · Roger Grosse -
2016 Poster: Can Active Memory Replace Attention? »
Łukasz Kaiser · Samy Bengio -
2015 Poster: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks »
Samy Bengio · Oriol Vinyals · Navdeep Jaitly · Noam Shazeer -
2015 Poster: Grammar as a Foreign Language »
Oriol Vinyals · Łukasz Kaiser · Terry Koo · Slav Petrov · Ilya Sutskever · Geoffrey Hinton