Timezone: »
One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.
Author Information
Daniel Brown (UC Berkeley)
Scott Niekum (UT Austin)
Marek Petrik (University of New Hampshire)
More from the Same Authors
-
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs »
Marek Petrik · Reazul Hasan Russel -
2018 Poster: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Spotlight: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2016 Poster: Safe Policy Improvement by Minimizing Robust Baseline Regret »
Mohammad Ghavamzadeh · Marek Petrik · Yinlam Chow -
2015 Poster: Policy Evaluation Using the Ω-Return »
Philip Thomas · Scott Niekum · Georgios Theocharous · George Konidaris -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning »
Marek Petrik · Dharmashankar Subramanian -
2014 Spotlight: RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning »
Marek Petrik · Dharmashankar Subramanian