Timezone: »

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #518

We study Reinforcement Learning for partially observable systems using function approximation. We propose a new PO-bilinear framework, that is general enough to include models such as undercomplete tabular Partially Observable Markov Decision Processes (POMDPs), Linear Quadratic Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs. Under this framework, we propose an actor-critic style algorithm that is capable to performing agnostic policy learning. Given a policy class that consists of memory based policies (i.e., policy that looks at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy among the policy class. For certain examples such as undercomplete POMDPs and LQGs, by leveraging their special properties, our algorithm is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon.

Author Information

Masatoshi Uehara (Cornell University)
Ayush Sekhari (Cornell University)
Jason Lee (University of Southern California)
Nathan Kallus (Cornell University)
Wen Sun (Cornell University)

More from the Same Authors