Timezone: »
Human-in-the-loop (HiL) reinforcement learning is gaining traction in domains with large action and state spaces, and sparse rewards by allowing the agent to take advice from HiL. Beyond advice accommodation, a sequential decision-making agent must be able to express the extent to which it was able to utilize the human advice. Subsequently, the agent should provide a means for the HiL to inspect parts of advice that it had to reject in favor of the overall environment objective. We introduce the problem of Advice-Conformance Verification which requires reinforcement learning (RL) agents to provide assurances to the human in the loop regarding how much of their advice is being conformed to. We then propose a Tree-based lingua-franca to support this communication, called a Preference Tree. We study two cases of good and bad advice scenarios in MuJoCo's Humanoid environment. Through our experiments, we show that our method can provide an interpretable means of solving the Advice-Conformance Verification problem by conveying whether or not the agent is using the human's advice. Finally, we present a human-user study with 20 participants that validates our method.
Author Information
Mudit Verma (Arizona State University)
Ayush Kharkwal (Arizona State University)
Subbarao Kambhampati (Arizona State University)
More from the Same Authors
-
2021 Spotlight: Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation »
Lin Guan · Mudit Verma · Sihang Guo · Ruohan Zhang · Subbarao Kambhampati -
2022 : Revisiting Value Alignment Through the Lens of Human-Aware AI »
Sarath Sreedharan · Subbarao Kambhampati -
2022 : Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) »
Karthik Valmeekam · Alberto Olmo · Sarath Sreedharan · Subbarao Kambhampati -
2022 : Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion »
Utkarsh Soni · Sarath Sreedharan · Mudit Verma · Lin Guan · Matthew Marquez · Subbarao Kambhampati -
2022 : Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences »
Lin Guan · Karthik Valmeekam · Subbarao Kambhampati -
2021 Poster: Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation »
Lin Guan · Mudit Verma · Sihang Guo · Ruohan Zhang · Subbarao Kambhampati -
2020 : Panel #2 »
Oren Etzioni · Heng Ji · Subbarao Kambhampati · Victoria Lin · Jiajun Wu -
2013 Poster: Synthesizing Robust Plans under Incomplete Domain Models »
Tuan A Nguyen · Subbarao Kambhampati · Minh Do -
2012 Poster: Action-Model Based Multi-agent Plan Recognition »
Hankz Hankui Zhuo · Qiang Yang · Subbarao Kambhampati