NeurIPS HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

Oral
in
Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning?

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo

[ Abstract ]

Abstract: Randomized least-square value iteration (RLSVI) is a provably efficient exploration method. However, it is limited to the case where 1) a good feature is known in advance and 2) this feature is fixed during the training: if otherwise, RLSVI suffers an unbearable computational burden to obtain the posterior samples of the parameter in the

$Q$ -value function. In this work, we present a practical algorithm named HyperDQN, addressing these two issues under the context of deep reinforcement learning, where the feature changes over iterations. HyperDQN is built on two parametric models: in addition to a non-linear neural network (i.e., base model) that predicts

$Q$ -values, our method employs a probabilistic hypermodel (i.e., meta model), which outputs the parameter of the base model. When both models are jointly optimized under a specifically designed objective, three purposes can be achieved. First, the hypermodel can generate approximate posterior samples regarding the parameter of the

$Q$ -value function. As a result, diverse

$Q$ -value functions are sampled to select exploratory action sequences. This retains the punchline of RLSVI for efficient exploration. Second, a good feature is learned to approximate

$Q$ -value functions. This addresses limitation 1. Third, the posterior samples of the

$Q$ -value function can be obtained in a more efficient way than the existing methods, and the changing feature does not affect the efficiency. This deals with limitation 2. On the Atari 2600 suite, after

$20$ M samples, HyperDQN achieves about

$2 \times$ improvements over (double) DQN, the advanced method Bootstrapped DQN, and the SOTA exploration bonus method OB2I. For another challenging task SuperMarioBros, HyperDQN outperforms baselines on

$7$ out of

$9$ games.

Oral in Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning?

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo

Oral
in
Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning?