Timezone: »
Deep neural networks have been successful in many reinforcement learning settings. However, compared to human learners they are overly data hungry. To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state. By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient. Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark.
Author Information
Jan Robine (Technische Universität Dortmund)
Marc Höftmann (Technische Universität Dortmund)
Tobias Uelwer (Technical University of Dortmund)
Stefan Harmeling (Technische Universität Dortmund)
More from the Same Authors
-
2022 : Optimizing Intermediate Representations of Generative Models for Phase Retrieval »
Tobias Uelwer · Sebastian Konietzny · Stefan Harmeling -
2022 : Cyclophobic Reinforcement Learning »
Stefan Wagner · Peter Arndt · Jan Robine · Stefan Harmeling -
2022 : Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm »
Marc Höftmann · Jan Robine · Stefan Harmeling -
2022 : Evaluating Robust Perceptual Losses for Image Reconstruction »
Tobias Uelwer · Felix Michels · Oliver De Candido -
2023 Workshop: I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models »
Estefany Kelly Buchanan · Fan Feng · Andreas Kriegler · Ian Mason · Tobias Uelwer · Yubin Xie · Rui Yang