Timezone: »
Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability - their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.
Author Information
Philip Thomas (University of Massachusetts Amherst)
William C Dabney (UMass Amherst)
Stephen Giguere (UMass Amherst)
Sridhar Mahadevan (UMass Amherst)
More from the Same Authors
-
2022 : Optimization using Parallel Gradient Evaluations on Multiple Parameters »
Yash Chandak · Shiv Shankar · Venkata Gandikota · Philip Thomas · Arya Mazumdar -
2023 Poster: Behavior Alignment via Reward Function Optimization »
Dhawal Gupta · Yash Chandak · Scott Jordan · Philip Thomas · Bruno da Silva -
2022 Poster: Off-Policy Evaluation for Action-Dependent Non-stationary Environments »
Yash Chandak · Shiv Shankar · Nathaniel Bastian · Bruno da Silva · Emma Brunskill · Philip Thomas -
2021 : Q&A for Philip Thomas »
Philip Thomas -
2021 : Advances in (High-Confidence) Off-Policy Evaluation »
Philip Thomas -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 Poster: SOPE: Spectrum of Off-Policy Estimators »
Christina Yuan · Yash Chandak · Stephen Giguere · Philip Thomas · Scott Niekum -
2021 Poster: Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs »
harsh satija · Philip Thomas · Joelle Pineau · Romain Laroche -
2021 Poster: Universal Off-Policy Evaluation »
Yash Chandak · Scott Niekum · Bruno da Silva · Erik Learned-Miller · Emma Brunskill · Philip Thomas -
2021 Poster: Structural Credit Assignment in Neural Networks using Reinforcement Learning »
Dhawal Gupta · Gabor Mihucz · Matthew Schlegel · James Kostas · Philip Thomas · Martha White -
2020 Poster: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Spotlight: Towards Safe Policy Improvement for Non-Stationary MDPs »
Yash Chandak · Scott Jordan · Georgios Theocharous · Martha White · Philip Thomas -
2020 Poster: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms »
Pinar Ozisik · Philip Thomas -
2019 Poster: Offline Contextual Bandits with High Probability Fairness Guarantees »
Blossom Metevier · Stephen Giguere · Sarah Brockman · Ari Kobren · Yuriy Brun · Emma Brunskill · Philip Thomas -
2019 Poster: A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning »
Francisco Garcia · Philip Thomas -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2012 Poster: Regularized Off-Policy TD-Learning »
Bo Liu · Sridhar Mahadevan · Ji Liu -
2012 Spotlight: Regularized Off-Policy TD-Learning »
Bo Liu · Sridhar Mahadevan · Ji Liu -
2011 Poster: TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning »
George Konidaris · Scott Niekum · Philip Thomas -
2011 Poster: Policy Gradient Coagent Networks »
Philip Thomas -
2010 Poster: Basis Construction from Power Series Expansions of Value Functions »
Sridhar Mahadevan · Bo Liu