Timezone: »

Projected Natural Actor-Critic
Philip Thomas · William C Dabney · Stephen Giguere · Sridhar Mahadevan

Sun Dec 08 02:00 PM -- 06:00 PM (PST) @ Harrah's Special Events Center, 2nd Floor

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability - their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.

Author Information

Philip Thomas (University of Massachusetts Amherst)
William C Dabney (UMass Amherst)
Stephen Giguere (UMass Amherst)
Sridhar Mahadevan (UMass Amherst)

More from the Same Authors