Timezone: »
To solve decision making tasks in unknown environments, artificial agents need to explore their surroundings. While simple tasks can be solved through naive exploration methods such as action noise, complex tasks require exploration objectives that direct the agent to novel states. However, current exploration objectives typically reward states purely based on how much the agent learns from them, regardless of whether the states are likely to be useful for solving later tasks. In this paper, we propose to guide exploration by empowerment to focus the agent on exploring regions in which it has a strong influence over its environment. We introduce a simple information-theoretic estimator of the agent's empowerment that is added as a reward term to any reinforcement learning method. On a novel BridgeWalk environment, we find that guiding exploration by empowerment helps the agent avoid falling into the unpredictable water, which substantially accelerates exploration and task learning. Experiments on Atari games demonstrate that the approach is general and often leads to improved performance.
Author Information
Vaibhav Saxena (Georgia Institute of Technology)
Jimmy Ba (University of Toronto / Vector Institute)
Danijar Hafner (Google)
More from the Same Authors
-
2021 : BLAST: Latent Dynamics Models from Bootstrapping »
Keiran Paster · Lev McKinney · Sheila McIlraith · Jimmy Ba -
2021 : Learning Robust Dynamics through Variational Sparse Gating »
Arnav Kumar Jain · Shivakanth Sujit · Shruti Joshi · Vincent Michalski · Danijar Hafner · Samira Ebrahimi Kahou -
2021 : Benchmarking the Spectrum of Agent Capabilities »
Danijar Hafner -
2022 : Large Language Models Are Human-Level Prompt Engineers »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Temporary Goals for Exploration »
Haoyang Xu · Jimmy Ba · Silviu Pitis · Harris Chan -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Evaluating Long-Term Memory in 3D Mazes »
Jurgis Pašukonis · Timothy Lillicrap · Danijar Hafner -
2022 : Steering Large Language Models using APE »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Keiran Paster · Silviu Pitis · Harris Chan · Jimmy Ba -
2022 : Rational Multi-Objective Agents Must Admit Non-Markov Reward Representations »
Silviu Pitis · Duncan Bailey · Jimmy Ba -
2022 : Danijar Hafner »
Danijar Hafner -
2022 : Invited Talk by Jimmy Ba »
Jimmy Ba -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2022 Poster: Deep Hierarchical Planning from Pixels »
Danijar Hafner · Kuang-Huei Lee · Ian Fischer · Pieter Abbeel -
2022 Poster: Learning Robust Dynamics through Variational Sparse Gating »
Arnav Kumar Jain · Shivakanth Sujit · Shruti Joshi · Vincent Michalski · Danijar Hafner · Samira Ebrahimi Kahou -
2022 Poster: Dataset Distillation using Neural Feature Regression »
Yongchao Zhou · Ehsan Nezhadarya · Jimmy Ba -
2021 : Benchmarking the Spectrum of Agent Capabilities Q&A »
Danijar Hafner -
2021 : Benchmarking the Spectrum of Agent Capabilities »
Danijar Hafner -
2021 Poster: Discovering and Achieving Goals via World Models »
Russell Mendonca · Oleh Rybkin · Kostas Daniilidis · Danijar Hafner · Deepak Pathak -
2021 Poster: Clockwork Variational Autoencoders »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2021 Poster: Learning Domain Invariant Representations in Goal-conditioned Block MDPs »
Beining Han · Chongyi Zheng · Harris Chan · Keiran Paster · Michael Zhang · Jimmy Ba -
2021 Poster: How does a Neural Network's Architecture Impact its Robustness to Noisy Labels? »
Jingling Li · Mozhi Zhang · Keyulu Xu · John Dickerson · Jimmy Ba -
2021 Poster: Information is Power: Intrinsic Control via Information Capture »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2020 : Contributed Talk #2: Evaluating Agents Without Rewards »
Brendon Matusch · Danijar Hafner · Jimmy Ba -
2020 : Contributed Talk: Planning from Pixels using Inverse Dynamics Models »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2020 Session: Orals & Spotlights Track 34: Deep Learning »
Tuo Zhao · Jimmy Ba -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2019 : Contributed Talks »
Jie Tang · Yijie Guo · Danijar Hafner -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Lookahead Optimizer: k steps forward, 1 step back »
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton -
2019 Poster: Graph Normalizing Flows »
Jenny Liu · Aviral Kumar · Jimmy Ba · Jamie Kiros · Kevin Swersky -
2019 Poster: Bayesian Layers: A Module for Neural Network Uncertainty »
Dustin Tran · Mike Dusenberry · Mark van der Wilk · Danijar Hafner -
2018 Poster: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion »
Jacob Buckman · Danijar Hafner · George Tucker · Eugene Brevdo · Honglak Lee -
2018 Oral: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion »
Jacob Buckman · Danijar Hafner · George Tucker · Eugene Brevdo · Honglak Lee -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse -
2017 Workshop: Deep Learning at Supercomputer Scale »
Erich Elsen · Danijar Hafner · Zak Stone · Brennan Saeta -
2017 Poster: Learning Hierarchical Information Flow with Recurrent Neural Modules »
Danijar Hafner · Alexander Irpan · James Davidson · Nicolas Heess