A Game-Theoretic Perspective of Generalization in Reinforcement Learning
Abstract
Generalization in reinforcement learning (RL) is of importance for real deployment of RL algorithms. Various schemes are proposed to address the generalization issues, including transfer learning, multi-task learning, meta learning, as well as robust and adversarial reinforcement learning. However, there is not a unified formulation of various schemes and comprehensive comparisons of methods across different schemes. In this work, we propound GiRL, a game-theoretic framework for generalization in reinforcement learning, where a RL agent is trained against an adversary over a set of tasks, over which the adversary can manipulate the distributions within a given threshold. With different configurations, GiRL is capable of reducing the various schemes mentioned above. To solve GiRL, we adapt the widely-used method in game theory, policy space response oracle (PSRO) framework with three significant modifications as follows: i) we adopt model-agnostic meta learning (MAML) as the best-response oracle, ii) we propose a modified projected replicated dynamics, i.e., R-PRD, which ensures the computed meta-strategy for the adversary falls in the threshold, and iii) we also propose a protocol of few-shot learning for multiple strategies during testing. Extensive experiments on MuJoCo environments demonstrate that our proposed method outperforms state-of-the-art baselines, e.g., MAML.