Skip to yearly menu bar Skip to main content

Workshop: Adaptive Experimental Design and Active Learning in the Real World

Active Learning for Iterative Offline Reinforcement Learning

Lan Zhang · Luigi Franco Tedesco · Pankaj Rajak · Youcef Zemmouri · Hakan Brunzell


Offline Reinforcement Learning (RL) has emerged as a promising approach to addressreal-world challenges where online interactions with the environment are limited, risky,or costly. Although, recent advancements produce high quality policies from offline data,currently, there is no systematic methodology to continue to improve them without resortingto online fine-tuning. This paper proposes to repurpose Offline RL to produce a sequenceof improving policies, namely, Iterative Offline Reinforcement Learning (IORL). To producesuch sequence, IORL has to cope with imbalanced offline datasets and to perform controlledenvironment exploration. Specifically, we introduce ”Return-based Sampling” as meansto selectively prioritize experience from high-return trajectories and active learning driven”Dataset Uncertainty Sampling” to probe state-actions inversely proportional to densityin the dataset.We demonstrate that our proposed approach produces policies that achievemonotonically increasing average returns, from 65.4 to 140.2, in the Atari environment.

Chat is not available.