Timezone: »
For zero-shot transfer in reinforcement learning where the reward function varies between different tasks, the successor features framework has been one of the popular approaches. However, in this framework, the transfer to new target tasks with generalized policy improvement (GPI) relies on only the source successor features [5] or additional successor features obtained from the function approximators’ generalization to novel inputs [11]. The goal of this work is to improve the transfer by more tightly bounding the value approximation errors of successor features on the new target tasks. Given a set of source tasks with their successor features, we present lower and upper bounds on the optimal values for novel task vectors that are expressible as linear combinations of source task vectors. Based on the bounds, we propose constrained GPI as a simple test-time approach that can improve transfer by constraining action-value approximation errors on new target tasks. Through experiments in the Scavenger and Reacher environment with state observations as well as the DeepMind Lab environment with visual observations, we show that the proposed constrained GPI significantly outperforms the prior GPI’s transfer performance. Our code and additional information are available at https://jaekyeom.github.io/projects/cgpi/.
Author Information
Jaekyeom Kim (Seoul National University)
Seohong Park (University of California, Berkeley)
Gunhee Kim (Seoul National University / RippleAI)
More from the Same Authors
-
2023 Poster: Recasting Meta-Continual Learning as Sequence Modeling »
Soochan Lee · Jaehyeon Son · Gunhee Kim -
2023 Poster: Offline Goal-Conditioned RL with Latent States as Actions »
Seohong Park · Dibya Ghosh · Benjamin Eysenbach · Sergey Levine -
2023 Poster: Federated Learning via Meta-Variational Dropout »
Insu Jeon · Minui Hong · Junhyeog Yun · Gunhee Kim -
2021 : Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes »
Hyunwoo Kim · Byeongchang Kim · Gunhee Kim -
2021 Poster: Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods »
Seohong Park · Jaekyeom Kim · Gunhee Kim -
2019 Poster: Self-Routing Capsule Networks »
Taeyoung Hahn · Myeongjang Pyeon · Gunhee Kim -
2015 Poster: Expressing an Image Stream with a Sequence of Natural Sentences »
Cesc C Park · Gunhee Kim