Timezone: »
This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ lead to sample efficient acquisition of goal-directed behavior, can be used with domain-specific low-level controllers, and facilitate transfer across embodiments. Our experiments in 5 environments ranging from 2D grid world to 3D visual navigation in realistic environments demonstrate the benefits of LAQ over simpler alternatives, imitation learning oracles, and competing methods.
Author Information
Matthew Chang (University of Illinois Urbana-Champaign)
Arjun Gupta (University of Illinois at Urbana-Champaign)
Saurabh Gupta (UIUC)
More from the Same Authors
-
2021 : RB2: Robotic Manipulation Benchmarking with a Twist »
Sudeep Dasari · Jianren Wang · Joyce Hong · Shikhar Bahl · Yixin Lin · Austin Wang · Abitha Thankaraj · Karanbir Chahal · Berk Calli · Saurabh Gupta · David Held · Lerrel Pinto · Deepak Pathak · Vikash Kumar · Abhinav Gupta -
2021 : Learning Value Functions from Undirected State-only Experience »
Matthew Chang · Arjun Gupta · Saurabh Gupta -
2022 : One-shot Visual Imitation via Attributed Waypoints and Demonstration Augmentation »
Matthew Chang · Saurabh Gupta -
2023 Poster: Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos »
Matthew Chang · Aditya Prakash · Saurabh Gupta -
2021 Poster: SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency »
Devendra Singh Chaplot · Murtaza Dalal · Saurabh Gupta · Jitendra Malik · Russ Salakhutdinov -
2020 Poster: Semantic Visual Navigation by Watching YouTube Videos »
Matthew Chang · Arjun Gupta · Saurabh Gupta