Timezone: »
A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions. Our novel training objective is compatible with both online and offline training: it does not require a reward signal, but agents can capitalize on reward information when available. We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features---often matching or exceeding the performance achieved with hand-designed compact state information.
Author Information
Cameron Allen (Brown University)
Neev Parikh (Brown University)
Omer Gottesman (Harvard University)
George Konidaris (Brown University)
More from the Same Authors
-
2020 : Poster #7 »
Neev Parikh -
2021 : Bayesian Exploration for Lifelong Reinforcement Learning »
Haotian Fu · Shangqun Yu · Michael Littman · George Konidaris -
2020 : Mini-panel discussion 1 - Bridging the gap between theory and practice »
Aviv Tamar · Emma Brunskill · Jost Tobias Springenberg · Omer Gottesman · Daniel Mankowitz -
2020 Poster: Learning to search efficiently for causally near-optimal treatments »
Samuel HÃ¥kansson · Viktor Lindblom · Omer Gottesman · Fredrik Johansson -
2018 Poster: Representation Balancing MDPs for Off-policy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill -
2011 Poster: TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning »
George Konidaris · Scott Niekum · Philip Thomas -
2010 Poster: Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories »
George Konidaris · Scott R Kuindersma · Andrew G Barto · Roderic A Grupen -
2009 Poster: Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining »
George Konidaris · Andrew G Barto -
2009 Spotlight: Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining »
George Konidaris · Andrew G Barto