NeurIPS Poster Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Poster

Agnostic $Q$ -learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang

Poster Session 1 #226

[ Abstract ] [ Paper PDF ]

[ Paper ]

Abstract: The current paper studies the problem of agnostic

Q

$Q$ -learning with function approximation in deterministic systems where the optimal

Q

$Q$ -function is approximable by a function in the class

F

$\mathcal{F}$ with approximation error

δ \geq 0

$\delta \ge 0$ . We propose a novel recursion-based algorithm and show that if

δ = O (ρ / \sqrt{\dim_{E}})

$\delta = O\left(\rho/\sqrt{\dim_E}\right)$ , then one can find the optimal policy using

O (\dim_{E})

$O(\dim_E)$ trajectories, where

ρ

$\rho$ is the gap between the optimal

Q

$Q$ -value of the best actions and that of the second-best actions and

\dim_{E}

$\dim_E$ is the Eluder dimension of

F

$\mathcal{F}$ . Our result has two implications:

\begin{enumerate}\item In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\dim_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity.\item In conjunction with the obvious lower bound in the tabular case, our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\dim_E\right)$ is tight in the agnostic setting.\end{enumerate}

$\begin{enumerate} \item In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\dim_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity. \item In conjunction with the obvious lower bound in the tabular case, our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\dim_E\right)$ is tight in the agnostic setting. \end{enumerate}$ Therefore, we help address the open problem on agnostic

Q

$Q$ -learning proposed in [Wen and Van Roy, 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

Chat is not available.

Poster

Agnostic QQQ-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang

Poster Session 1 #226

Agnostic $Q$ -learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity