Timezone: »
Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate \textit{value decomposition} into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward \textit{influence} metric, which measures each reward component's effect on agent decision-making. Using these tools, we provide several demonstrations of decomposition's use in identifying and addressing problems in the design of both environments and agents. Value decomposition is broadly applicable and easy to incorporate into existing algorithms and workflows, making it a powerful tool in an RL practitioner's toolbox.
Author Information
James MacGlashan (Sony AI)
Evan Archer (Sony AI)
Alisa Devlic (Sony AI)
Takuma Seno (Sony AI)
Craig Sherstan (Sony AI)
Peter Wurman (North Carolina State University)
Peter Stone (The University of Texas at Austin, Sony AI)
More from the Same Authors
-
2020 : Paper 19: Multiagent Driving Policy for Congestion Reduction in a Large Scale Scenario »
Jiaxun Cui · Peter Stone -
2021 : Task-Independent Causal State Abstraction »
Zizhao Wang · Xuesu Xiao · Yuke Zhu · Peter Stone -
2021 : Leveraging Information about Background Music in Human-Robot Interaction »
Elad Liebman · Peter Stone -
2021 : Expert Human-Level Driving in Gran Turismo Sport Using Deep Reinforcement Learning with Image-based Representation »
Ryuji Imamura · Takuma Seno · Kenta Kawamoto · Michael Spranger -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2022 : BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Mao Ye · Bo Liu · Stephen Wright · Peter Stone · Qiang Liu -
2022 : ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning »
Eddy Hudson · Ishan Durugkar · Garrett Warnell · Peter Stone -
2022 : ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning »
Eddy Hudson · Ishan Durugkar · Garrett Warnell · Peter Stone -
2022 : Panel RL Theory-Practice Gap »
Peter Stone · Matej Balog · Jonas Buchli · Jason Gauci · Dhruv Madeka -
2022 : Panel RL Benchmarks »
Minmin Chen · Pablo Samuel Castro · Caglar Gulcehre · Tony Jebara · Peter Stone -
2022 : Invited talk: Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning »
Peter Stone -
2022 : Human in the Loop Learning for Robot Navigation and Task Learning from Implicit Human Feedback »
Peter Stone -
2022 Poster: BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach »
Bo Liu · Mao Ye · Stephen Wright · Peter Stone · Qiang Liu -
2021 Poster: Adversarial Intrinsic Motivation for Reinforcement Learning »
Ishan Durugkar · Mauricio Tec · Scott Niekum · Peter Stone -
2021 Poster: Conflict-Averse Gradient Descent for Multi-task learning »
Bo Liu · Xingchao Liu · Xiaojie Jin · Peter Stone · Qiang Liu -
2021 Poster: Machine versus Human Attention in Deep Reinforcement Learning Tasks »
Sihang Guo · Ruohan Zhang · Bo Liu · Yifeng Zhu · Dana Ballard · Mary Hayhoe · Peter Stone -
2020 : Q&A: Peter Stone (The University of Texas at Austin): Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination, with Natasha Jaques (Google) [moderator] »
Peter Stone · Natasha Jaques -
2020 : Invited Speaker: Peter Stone (The University of Texas at Austin) on Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination »
Peter Stone -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Discussion Panel »
Pete Florence · Dorsa Sadigh · Carolina Parada · Jeannette Bohg · Roberto Calandra · Peter Stone · Fabio Ramos -
2020 : Invited talk: Peter Stone "Grounded Simulation Learning for Sim2Real with Connections to Off-Policy Reinforcement Learning" »
Peter Stone -
2020 Poster: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks »
Lemeng Wu · Bo Liu · Peter Stone · Qiang Liu -
2020 Poster: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch »
Siddharth Desai · Ishan Durugkar · Haresh Karnan · Garrett Warnell · Josiah Hanna · Peter Stone -
2018 : Peter Stone »
Peter Stone -
2018 : Control Algorithms for Imitation Learning from Observation »
Peter Stone -
2018 : Peter Stone »
Peter Stone -
2017 Spotlight: Fast amortized inference of neural activity from calcium imaging data with variational autoencoders »
Artur Speiser · Jinyao Yan · Evan Archer · Lars Buesing · Srinivas C Turaga · Jakob H Macke -
2017 Poster: Fast amortized inference of neural activity from calcium imaging data with variational autoencoders »
Artur Speiser · Jinyao Yan · Evan Archer · Lars Buesing · Srinivas C Turaga · Jakob H Macke -
2016 : Peter Stone (University of Texas at Austin) »
Peter Stone -
2016 Poster: Linear dynamical neural population models through nonlinear embeddings »
Yuanjun Gao · Evan Archer · Liam Paninski · John Cunningham -
2015 Workshop: Learning, Inference and Control of Multi-Agent Systems »
Vicenç Gómez · Gerhard Neumann · Jonathan S Yedidia · Peter Stone -
2014 Poster: Low-dimensional models of neural population activity in sensory cortical circuits »
Evan Archer · Urs Koster · Jonathan W Pillow · Jakob H Macke -
2013 Poster: Bayesian entropy estimation for binary spike train data using parametric prior knowledge »
Evan Archer · Il Memming Park · Jonathan W Pillow -
2013 Poster: Universal models for binary spike patterns using centered Dirichlet processes »
Il Memming Park · Evan Archer · Kenneth W Latimer · Jonathan W Pillow -
2013 Spotlight: Bayesian entropy estimation for binary spike train data using parametric prior knowledge »
Evan Archer · Il Memming Park · Jonathan W Pillow -
2013 Poster: Spectral methods for neural characterization using generalized quadratic models »
Il Memming Park · Evan Archer · Nicholas Priebe · Jonathan W Pillow -
2012 Poster: Bayesian estimation of discrete entropy with mixtures of stick-breaking priors »
Evan Archer · Jonathan W Pillow · Il Memming Park