Spotlight Poster

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

Zhihan Liu ⋅ Miao Lu ⋅ WEI XIONG ⋅ Han Zhong ⋅ Hao Hu ⋅ Shenao Zhang ⋅ Sirui Zheng ⋅ Zhuoran Yang ⋅ Zhaoran Wang

2023 Spotlight Poster

[ Paper] [ OpenReview]

Abstract

In reinforcement learning (RL), balancing exploration and exploitation is crucial for achieving an optimal policy in a sample-efficient way. To this end, existing sample- efficient algorithms typically consist of three components: estimation, planning, and exploration. However, to cope with general function approximators, most of them involve impractical algorithmic components to incentivize exploration, such as data-dependent level-set constraints or complicated sampling procedures. To address this challenge, we propose an easy-to-implement RL framework called Maximize to Explore (MEX), which only needs to optimize unconstrainedly a single objective that integrates the estimation and planning components while balancing exploration and exploitation automatically. Theoretically, we prove that the MEX achieves a sublinear regret with general function approximators and is extendable to the zero-sum Markov game setting. Meanwhile, we adapt deep RL baselines to design practical versions of MEX in both the model-based and model-free settings, which outperform baselines in various MuJoCo environments with sparse reward by a stable margin. Compared with existing sample-efficient algorithms with general function approximators, MEX achieves similar sample efficiency while also enjoying a lower computational cost and is more compatible with modern deep RL methods.

Video

Chat is not available.