Timezone: »
Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task. We evaluate our modified architecture on a set of partially-observable environments and find that, in practice, our model outperforms RNN's while also running over five times faster. Then, by leveraging the model’s ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper show that structured state space models are fast and performant for in-context reinforcement learning tasks. We provide code at https://github.com/luchris429/s5rl.
Author Information
Chris Lu (University of Oxford)
Yannick Schroecker (Google DeepMind)
Albert Gu (Carnegie Mellon University)
Emilio Parisotto (School of Computer Science, Carnegie Mellon University)
Jakob Foerster (University of Oxford)
Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.
Satinder Singh (DeepMind)
Feryal Behbahani (DeepMind)
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Spotlight: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2021 : GrASP: Gradient-Based Affordance Selection for Planning »
Vivek Veeriah · Zeyu Zheng · Richard L Lewis · Satinder Singh -
2021 : No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients »
Risto Vuorio · Jacob Beck · Greg Farquhar · Jakob Foerster · Shimon Whiteson -
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 : A Fine-Tuning Approach to Belief State Modeling »
Samuel Sokota · Hengyuan Hu · David Wu · Jakob Foerster · Noam Brown -
2021 : Generalized Belief Learning in Multi-Agent Settings »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 : In-Context Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2022 : Adversarial Cheap Talk »
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2022 : Human-AI Coordination via Human-Regularized Search and Learning »
Hengyuan Hu · David Wu · Adam Lerer · Jakob Foerster · Noam Brown -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2022 : Adversarial Cheap Talk »
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning »
Mikayel Samvelyan · Akbir Khan · Michael Dennis · Minqi Jiang · Jack Parker-Holder · Jakob Foerster · Roberta Raileanu · Tim Rocktäschel -
2023 : Leading the Pack: N-player Opponent Shaping »
Alexandra Souly · Timon Willi · Akbir Khan · Robert Kirk · Chris Lu · Edward Grefenstette · Tim Rocktäschel -
2023 : Leading the Pack: N-player Opponent Shaping »
Alexandra Souly · Timon Willi · Akbir Khan · Robert Kirk · Chris Lu · Edward Grefenstette · Tim Rocktäschel -
2023 : EvIL: Evolution Strategies for Generalisable Imitation Learning »
Silvia Sapora · Chris Lu · Gokul Swamy · Yee Whye Teh · Jakob Foerster -
2023 : Policy-Guided Diffusion »
Matthew T Jackson · Michael Matthews · Cong Lu · Jakob Foerster · Shimon Whiteson -
2023 : Noisy ZSC: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games »
Usman Anwar · Jia Wan · David Krueger · Jakob Foerster -
2023 : Vision-Language Models as a Source of Rewards »
Harris Chan · Volodymyr Mnih · Feryal Behbahani · Michael Laskin · Luyu Wang · Fabio Pardo · Maxime Gazeau · Himanshu Sahni · Daniel Horgan · Kate Baumli · Yannick Schroecker · Stephen Spencer · Richie Steigerwald · John Quan · Gheorghe Comanici · Sebastian Flennerhag · Alexander Neitz · Lei Zhang · Tom Schaul · Satinder Singh · Clare Lyle · Tim Rocktäschel · Jack Parker-Holder · Kristian Holsheimer -
2023 : Discovering Temporally-Aware Reinforcement Learning Algorithms »
Matthew T Jackson · Chris Lu · Louis Kirsch · Robert Lange · Shimon Whiteson · Jakob Foerster -
2023 : JaxMARL: Multi-Agent RL Environments in JAX »
Alexander Rutherford · Benjamin Ellis · Matteo Gallici · Jonathan Cook · Andrei Lupu · Garðar Ingvarsson · Timon Willi · Akbir Khan · Christian Schroeder de Witt · Alexandra Souly · Saptarashmi Bandyopadhyay · Mikayel Samvelyan · Minqi Jiang · Robert Lange · Shimon Whiteson · Bruno Lacerda · Nick Hawes · Tim Rocktäschel · Chris Lu · Jakob Foerster -
2023 : POMRL: No-Regret Learning-to-Plan with Increasing Horizons »
Khimya Khetarpal · Claire Vernade · Brendan O'Donoghue · Satinder Singh · Tom Zahavy -
2023 Workshop: Socially Responsible Language Modelling Research (SoLaR) »
Usman Anwar · David Krueger · Samuel Bowman · Jakob Foerster · Su Lin Blodgett · Roberta Raileanu · Alan Chan · Katherine Lee · Laura Ruis · Robert Kirk · Yawen Duan · Xin Chen · Kawin Ethayarajh -
2023 : Feryal Behbahani »
Feryal Behbahani -
2023 Poster: SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning »
Benjamin Ellis · Jonathan Cook · Skander Moalla · Mikayel Samvelyan · Mingfei Sun · Anuj Mahajan · Jakob Foerster · Shimon Whiteson -
2023 Poster: Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design »
Matthew T Jackson · Minqi Jiang · Jack Parker-Holder · Risto Vuorio · Chris Lu · Greg Farquhar · Shimon Whiteson · Jakob Foerster -
2023 Poster: Similarity-based cooperative equilibrium »
Caspar Oesterheld · Johannes Treutlein · Roger Grosse · Vincent Conitzer · Jakob Foerster -
2023 Poster: Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2023 Poster: A Definition of Continual Reinforcement Learning »
David Abel · Andre Barreto · Benjamin Van Roy · Doina Precup · Hado van Hasselt · Satinder Singh -
2023 Poster: Large Language Models can Implement Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2023 Poster: Combining Behaviors with the Successor Features Keyboard »
Wilka Carvalho Carvalho · Andre Saraiva · Angelos Filos · Andrew Lampinen · Loic Matthey · Richard L Lewis · Honglak Lee · Satinder Singh · Danilo Jimenez Rezende · Daniel Zoran -
2022 : Jakob Foerster »
Jakob Foerster -
2022 Poster: Palm up: Playing in the Latent Manifold for Unsupervised Pretraining »
Hao Liu · Tom Zahavy · Volodymyr Mnih · Satinder Singh -
2022 Poster: Approximate Value Equivalence »
Christopher Grimm · Andre Barreto · Satinder Singh -
2022 Poster: Proximal Learning With Opponent-Learning Awareness »
Stephen Zhao · Chris Lu · Roger Grosse · Jakob Foerster -
2022 Poster: Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world »
Eugene Vinitsky · Nathan Lichtlé · Xiaomeng Yang · Brandon Amos · Jakob Foerster -
2022 Poster: Grounding Aleatoric Uncertainty for Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Küttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2022 Poster: Off-Team Learning »
Brandon Cui · Hengyuan Hu · Andrei Lupu · Samuel Sokota · Jakob Foerster -
2022 Poster: Self-Explaining Deviations for Coordination »
Hengyuan Hu · Samuel Sokota · David Wu · Anton Bakhtin · Andrei Lupu · Brandon Cui · Jakob Foerster -
2022 Poster: Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2022 Poster: Discovered Policy Optimisation »
Chris Lu · Jakub Kuba · Alistair Letcher · Luke Metz · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Influencing Long-Term Behavior in Multiagent Reinforcement Learning »
Dong-Ki Kim · Matthew Riemer · Miao Liu · Jakob Foerster · Michael Everett · Chuangchuang Sun · Gerald Tesauro · Jonathan How -
2022 Poster: Equivariant Networks for Zero-Shot Coordination »
Darius Muglich · Christian Schroeder de Witt · Elise van der Pol · Shimon Whiteson · Jakob Foerster -
2021 : Reducing the Information Horizon of Bayes-Adaptive Markov Decision Processes via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2021 Workshop: Cooperative AI »
Natasha Jaques · Edward Hughes · Jakob Foerster · Noam Brown · Kalesha Bullard · Charlotte Smith -
2021 : Bootstrapped Meta-Learning »
Sebastian Flennerhag · Yannick Schroecker · Tom Zahavy · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2021 Poster: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Replay-Guided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: K-level Reasoning for Zero-Shot Coordination in Hanabi »
Brandon Cui · Hengyuan Hu · Luis Pineda · Jakob Foerster -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2021 Poster: Neural Pseudo-Label Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster -
2021 Oral: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2020 : Closing remarks »
Raymond Chua · Feryal Behbahani · Julie J Lee · Rui Ponte Costa · Doina Precup · Blake Richards · Ida Momennejad -
2020 : Invited Talk #7 QnA - Yael Niv »
Yael Niv · Doina Precup · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Yael Niv »
Doina Precup · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Contributed talk#3 speaker »
Feryal Behbahani · Raymond Chua -
2020 : Invited Talk #6 QnA - Catherine Hartley »
Catherine Hartley · Julie J Lee · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Catherine Hartley »
Julie J Lee · Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #5 QnA - Ishita Dasgupta »
Ishita Dasgupta · Julie J Lee · Feryal Behbahani · Raymond Chua -
2020 : Speaker Introduction: Ishita Dasgupta »
Julie J Lee · Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #4 QnA - George Konidaris »
George Konidaris · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: George Konidaris »
Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #3 QnA - Kim Stachenfeld »
Kimberly Stachenfeld · Ida Momennejad · Feryal Behbahani · Raymond Chua -
2020 Workshop: Talking to Strangers: Zero-Shot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì -
2020 : Speaker Introduction: Kim Stachenfeld »
Ida Momennejad · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Contributed talk#2 »
Raymond Chua · Feryal Behbahani · Sara Zannone -
2020 : Speaker Introduction: Contributed talk#1 »
Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #2 QnA - Claudia Clopath (Live, no recording) »
Claudia Clopath · Rui Ponte Costa · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Claudia Clopath »
Raymond Chua · Feryal Behbahani · Rui Ponte Costa -
2020 : Invited talk 1 QnA: Shakir Mohamed »
Shakir Mohamed · Feryal Behbahani · Raymond Chua -
2020 : Speaker Introduction: Shakir Mohamed »
Feryal Behbahani · Raymond Chua -
2020 Workshop: Biological and Artificial Reinforcement Learning »
Raymond Chua · Feryal Behbahani · Julie J Lee · Sara Zannone · Rui Ponte Costa · Blake Richards · Ida Momennejad · Doina Precup -
2020 : Organizers Opening Remarks »
Raymond Chua · Feryal Behbahani · Julie J Lee · Ida Momennejad · Rui Ponte Costa · Blake Richards · Doina Precup -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Spotlight: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Poster: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 Spotlight: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 : Women at DeepMind: Applying for technical roles »
Feryal Behbahani · Mihaela Rosca · Kate Parkyn -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2020 Spotlight: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2019 : Opening Remarks »
Raymond Chua · Feryal Behbahani · Sara Zannone · Rui Ponte Costa · Claudia Clopath · Doina Precup · Blake Richards -
2019 Workshop: Biological and Artificial Reinforcement Learning »
Raymond Chua · Sara Zannone · Feryal Behbahani · Rui Ponte Costa · Claudia Clopath · Blake Richards · Doina Precup -
2019 Poster: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning »
Gregory Farquhar · Shimon Whiteson · Jakob Foerster -
2019 Poster: Multi-Agent Common Knowledge Reinforcement Learning »
Christian Schroeder de Witt · Jakob Foerster · Gregory Farquhar · Philip Torr · Wendelin Boehmer · Shimon Whiteson -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2018 Workshop: Emergent Communication Workshop »
Jakob Foerster · Angeliki Lazaridou · Ryan Lowe · Igor Mordatch · Douwe Kiela · Kyunghyun Cho -
2017 Workshop: Emergent Communication Workshop »
Jakob Foerster · Igor Mordatch · Angeliki Lazaridou · Kyunghyun Cho · Douwe Kiela · Pieter Abbeel -
2016 Poster: Learning to Communicate with Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Yannis Assael · Nando de Freitas · Shimon Whiteson