Skip to yearly menu bar Skip to main content


Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun

Great Hall & Hall B1+B2 (level 1) #1410
[ ]
Thu 14 Dec 8:45 a.m. PST — 10:45 a.m. PST


We consider offline reinforcement learning (RL) where we only have only access to offline data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of the offline data over state and action space, we propose value-based algorithms with PAC guarantees under partial coverage, specifically, coverage of offline data against a single policy, and realizability of soft Q-function (a.k.a., entropy-regularized Q-function) and another function, which is defined as a solution to a saddle point of certain minimax optimization problem). Furthermore, we show the analogous result for Q-functions instead of soft Q-functions. To attain these guarantees, we use novel algorithms with minimax loss functions to accurately estimate soft Q-functions and Q-functions with -convergence guarantees measured on the offline data. We introduce these loss functions by casting the estimation problems into nonlinear convex optimization problems and taking the Lagrange functions.

Chat is not available.