Timezone: »
Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of processing. A step in the staircase comprises of backward tokens (encoding the sequence so far seen) and forward tokens (ingesting a new part of the sequence). Thus our model can trade off performance and compute, by increasing the amount of recurrence through time and depth. Staircase attention is shown to be able to solve tasks that involve tracking that conventional Transformers cannot, due to this recurrence. Further, it is shown to provide improved modeling power for the same size model (number of parameters) compared to self-attentive Transformers on large language modeling and dialogue tasks, yielding significant perplexity gains.
Author Information
Da JU (Meta AI)
Stephen Roller (Facebook)
Sainbayar Sukhbaatar (Meta AI)
Jason E Weston (Meta AI)
Jason Weston received a PhD. (2000) from Royal Holloway, University of London under the supervision of Vladimir Vapnik. From 2000 to 2002, he was a researcher at Biowulf technologies, New York, applying machine learning to bioinformatics. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2004 to June 2009 he was a research staff member at NEC Labs America, Princeton. From July 2009 onwards he has been a research scientist at Google, New York. Jason Weston's current research focuses on various aspects of statistical machine learning and its applications, particularly in text and images.
More from the Same Authors
-
2020 : Invited Talk 4 Presentation - Jason Weston - (Towards) Learning from Conversing »
Jason E Weston -
2021 Spotlight: Hash Layers For Large Sparse Models »
Stephen Roller · Sainbayar Sukhbaatar · arthur szlam · Jason Weston -
2022 : Learning to Reason and Memorize with Self-Questioning »
Jack Lanchantin · Shubham Toshniwal · Jason E Weston · arthur szlam · Sainbayar Sukhbaatar -
2022 : Invited Keynote by Jason Weston »
Jason Weston -
2022 : Panel Discussion: Opportunities and Challenges »
Kenneth Norman · Janice Chen · Samuel J Gershman · Albert Gu · Sepp Hochreiter · Ida Momennejad · Hava Siegelmann · Sainbayar Sukhbaatar -
2022 : Learning to Reason and Memorize with Self-Questioning »
Jack Lanchantin · Shubham Toshniwal · Jason E Weston · arthur szlam · Sainbayar Sukhbaatar -
2022 : Sainbayar Sukhbaatar: "Brain-inspired memory models" »
Sainbayar Sukhbaatar -
2021 Poster: Hash Layers For Large Sparse Models »
Stephen Roller · Sainbayar Sukhbaatar · arthur szlam · Jason Weston -
2020 Workshop: Wordplay: When Language Meets Games »
Prithviraj Ammanabrolu · Matthew Hausknecht · Xingdi Yuan · Marc-Alexandre Côté · Adam Trischler · Kory Mathewson @korymath · John Urbanek · Jason Weston · Mark Riedl -
2020 : Panel »
Maxine Eskenazi · Ankur Parikh · Govindarajan Thattai · Alexander Rudnicky · Jason E Weston -
2020 : Invited Talk 4 Q/A - Jason Weston »
Jason E Weston -
2020 Memorial: In Memory of Olivier Chapelle »
Bernhard Schölkopf · Andre Elisseeff · Olivier Bousquet · Vladimir Vapnik · Jason E Weston -
2018 : Teaching through Dialogue and Games »
Jason E Weston -
2018 : Humans and models as embodied dialogue agents in text-based games »
Jason Weston -
2018 : The Conversational Intelligence Challenge 2 (ConvAI2) : Setup, Opening Words »
Jason Weston -
2016 : Jason Weston »
Jason E Weston -
2016 Workshop: Let's Discuss: Learning Methods for Dialogue »
Hal Daumé III · Paul Mineiro · Amanda Stent · Jason E Weston -
2016 Poster: Dialog-based Language Learning »
Jason E Weston -
2016 Poster: Learning Multiagent Communication with Backpropagation »
Sainbayar Sukhbaatar · arthur szlam · Rob Fergus -
2015 Workshop: Reasoning, Attention, Memory (RAM) Workshop »
Jason E Weston · Sumit Chopra · Antoine Bordes -
2015 : Evaluating Prerequisite Qualities For End-to-End Dialog Systems »
Jason E Weston -
2015 Poster: End-To-End Memory Networks »
Sainbayar Sukhbaatar · arthur szlam · Jason Weston · Rob Fergus -
2015 Oral: End-To-End Memory Networks »
Sainbayar Sukhbaatar · arthur szlam · Jason Weston · Rob Fergus -
2014 Workshop: 4th Workshop on Automated Knowledge Base Construction (AKBC) »
Sameer Singh · Fabian M Suchanek · Sebastian Riedel · Partha Pratim Talukdar · Kevin Murphy · Christopher Ré · William Cohen · Tom Mitchell · Andrew McCallum · Jason E Weston · Ramanathan Guha · Boyan Onyshkevych · Hoifung Poon · Oren Etzioni · Ari Kobren · Arvind Neelakantan · Peter Clark -
2011 Workshop: Learning Semantics »
Antoine Bordes · Jason E Weston · Ronan Collobert · Leon Bottou -
2010 Poster: Label Embedding Trees for Large Multi-Class Tasks »
Samy Bengio · Jason E Weston · David Grangier -
2009 Poster: Polynomial Semantic Indexing »
Bing Bai · Jason E Weston · David Grangier · Ronan Collobert · Kunihiko Sadamasa · Yanjun Qi · Corinna Cortes · Mehryar Mohri -
2009 Tutorial: Deep Learning in Natural Language Processing »
Ronan Collobert · Jason E Weston