Timezone: »
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem.
Author Information
Byron Boots (University of Washington)
Geoffrey Gordon (CMU Machine Learning)
Dr. Gordon is an Associate Research Professor in the Department of Machine Learning at Carnegie Mellon University, and co-director of the Department's Ph. D. program. He works on multi-robot systems, statistical machine learning, game theory, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999.
More from the Same Authors
-
2021 : Towards a Trace-Preserving Tensor Network Representation of Quantum Channels »
Siddarth Srinivasan · Sandesh Adhikary · Jacob Miller · Guillaume Rabusseau · Byron Boots -
2022 : Learning Semantics-Aware Locomotion Skills from Human Demonstrations »
Yuxiang Yang · Xiangyun Meng · Wenhao Yu · Tingnan Zhang · Jie Tan · Byron Boots -
2023 : CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller »
Yuxiang Yang · Guanya Shi · Xiangyun Meng · Wenhao Yu · Tingnan Zhang · Jie Tan · Byron Boots -
2023 Poster: Fundamental Limits and Tradeoffs in Invariant Representation Learning »
Han Zhao · Chen Dan · Bryon Aragam · Tommi Jaakkola · Geoffrey Gordon · Pradeep Ravikumar -
2023 Poster: Adversarial Model for Offline Reinforcement Learning »
Mohak Bhardwaj · Tengyang Xie · Byron Boots · Nan Jiang · Ching-An Cheng -
2021 : Towards a Trace-Preserving Tensor Network Representation of Quantum Channels »
Siddarth Srinivasan · Sandesh Adhikary · Jacob Miller · Guillaume Rabusseau · Byron Boots -
2020 : Q&A: Byron Boots »
Byron Boots -
2020 : Invited Talk: Byron Boots »
Byron Boots -
2020 Poster: Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks »
Amir Rahimi · Amirreza Shaban · Ching-An Cheng · Richard I Hartley · Byron Boots -
2020 Poster: Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation »
Han Zhao · Jianfeng Chi · Yuan Tian · Geoffrey Gordon -
2020 Poster: Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift »
Remi Tachet des Combes · Han Zhao · Yu-Xiang Wang · Geoffrey Gordon -
2019 : Continuous Online Learning and New Insights to Online Imitation Learning »
Jonathan Lee · Ching-An Cheng · Ken Goldberg · Byron Boots -
2019 Poster: Learning Neural Networks with Adaptive Regularization »
Han Zhao · Yao-Hung Hubert Tsai · Russ Salakhutdinov · Geoffrey Gordon -
2019 Poster: Towards modular and programmable architecture search »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon · Darshan Patil · Nghia Le · Daniel Ferreira -
2018 Poster: Learning Beam Search Policies via Imitation Learning »
Renato Negrinho · Matthew Gormley · Geoffrey Gordon -
2018 Poster: Dual Policy Iteration »
Wen Sun · Geoffrey Gordon · Byron Boots · J. Bagnell -
2018 Poster: Adversarial Multiple Source Domain Adaptation »
Han Zhao · Shanghang Zhang · Guanhang Wu · José M. F. Moura · Joao P Costeira · Geoffrey Gordon -
2017 Poster: Linear Time Computation of Moments in Sum-Product Networks »
Han Zhao · Geoffrey Gordon -
2017 Poster: Predictive State Recurrent Neural Networks »
Carlton Downey · Ahmed Hefny · Byron Boots · Geoffrey Gordon · Boyue Li -
2016 Poster: A Unified Approach for Learning the Parameters of Sum-Product Networks »
Han Zhao · Pascal Poupart · Geoffrey Gordon -
2015 Poster: Supervised Learning for Dynamical System Learning »
Ahmed Hefny · Carlton Downey · Geoffrey Gordon -
2014 Session: Oral Session 7 »
Geoffrey Gordon -
2013 Workshop: Workshop on Spectral Learning »
Byron Boots · Daniel Hsu · Borja Balle -
2012 Tutorial: Machine Learning for Student Learning »
Emma Brunskill · Geoffrey Gordon -
2007 Oral: A Constraint Generation Approach to Learning Stable Linear Dynamical Systems »
Sajid M Siddiqi · Byron Boots · Geoffrey Gordon -
2007 Poster: A Constraint Generation Approach to Learning Stable Linear Dynamical Systems »
Sajid M Siddiqi · Byron Boots · Geoffrey Gordon -
2006 Poster: No-regret algorithms for Online Convex Programs »
Geoffrey Gordon -
2006 Talk: No-regret algorithms for Online Convex Programs »
Geoffrey Gordon -
2006 Poster: Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General Sum Stochastic Games »
Chris D Murray · Geoffrey Gordon