Workshop: Offline Reinforcement Learning

Domain Knowledge Guided Offline Q Learning

Xiaoxuan Zhang · Sijia Zhang · Yen-Yun Yu


Offline reinforcement learning (RL) is a promising method for applications where direct exploration is not possible but a decent initial model is expected for the online stage. In practice, offline RL can underperform because of overestimation attributed to distributional shift between the training data and the learned policy. A common approach to mitigating this issue is to constrain the learned policies so that they remain close to the fixed batch of interactions. This method is typically used without considering the application context. However, domain knowledge is available in many real-world cases and may be utilized to effectively handle the issue of out-of-distribution actions. Incorporating domain knowledge in training avoids additional function approximation to estimate the behavior policy and results in easy-to-interpret policies. To encourage the adoption of offline RL in practical applications, we propose the Domain Knowledge guided Q learning (DKQ). We show that DKQ is a conservative approach, where the unique fixed point still exists and is upper bounded by the standard optimal Q function. DKQ also leads to lower chance of overestimation. In addition, we demonstrate the benefit of DKQ empirically via a novel, real-world case study - guided family tree building, which appears to be the first application of offline RL in genealogy. The results show that guided by proper domain knowledge, DKQ can achieve similar offline performance as standard Q learning and is better aligned with the behavior policy revealed from the data, indicating a lower risk of overestimation on unseen actions. Further, we demonstrate the efficiency and flexibility of DKQ with a classical control problem.

Chat is not available.