Skip to yearly menu bar Skip to main content


Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

Dylan J Foster · Noah Golowich · Jian Qian · Alexander Rakhlin · Ayush Sekhari

Great Hall & Hall B1+B2 (level 1) #1909
[ ]
Tue 12 Dec 3:15 p.m. PST — 5:15 p.m. PST


We consider the problem of interactive decision making, encompassing structured bandits and reinforcementlearning with general function approximation. Recently, Foster et al. (2021) introduced theDecision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decisionmaking, as well as a meta-algorithm, Estimation-to-Decisions, which achieves upperbounds in terms of the same quantity. Estimation-to-Decisions is a reduction, which liftsalgorithms for (supervised) online estimation into algorithms fordecision making. In this paper, we show that by combining Estimation-to-Decisions witha specialized form of "optimistic" estimation introduced byZhang (2022), it is possible to obtain guaranteesthat improve upon those of Foster et al. (2021) byaccommodating more lenient notions of estimation error. We use this approach to derive regret bounds formodel-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.

Chat is not available.