NeurIPS Poster Asymptotically Optimal Quantile Pure Exploration for Infinite-Armed Bandits

Poster

Asymptotically Optimal Quantile Pure Exploration for Infinite-Armed Bandits

Evelyn Xiao-Yue Gong · Mark Sellke

Great Hall & Hall B1+B2 (level 1) #1818

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: We study pure exploration with infinitely many bandit arms generated \iid from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability

1 - δ

$1-\delta$ , within

ε

$\varepsilon$ of being with the top

η

$\eta$ -fraction of arms; this is a natural adaptation of the classical PAC guarantee for infinite action sets. We consider both the fixed confidence and fixed budget settings, aiming respectively for optimal \emph{expected} and \emph{fixed} sample complexity.For fixed confidence, we give an algorithm with expected sample complexity

O (\frac{\log (1 / η) \log (1 / δ)}{η ε^{2}})

$O\left(\frac{\log (1/\eta)\log (1/\delta)}{\eta\varepsilon^2}\right)$ . This is optimal except for the

\log (1 / η)

$\log (1/\eta)$ factor, and the

δ

$\delta$ -dependence closes a quadratic gap in the literature. For fixed budget, we show the asymptotically optimal sample complexity as

δ \to 0

$\delta\to 0$ is

c^{- 1} \log (1 / δ) (\log \log (1 / δ))^{2}

$c^{-1}\log(1/\delta)\big(\log\log(1/\delta)\big)^2$ to leading order; equivalently, the optimal failure probability with exactly

N

$N$ samples decays as

\exp (- (1 \pm o (1)) \frac{c N}{\log^{2} N})

$\exp\big(-(1\pm o(1))\frac{cN}{\log^2 N}\big)$ .The value of

c

$c$ depends explicitly on the problem parameters (including the unknown arm distribution) through a certain Fisher information distance. Even the strictly super-linear dependence on

\log (1 / δ)

$\log(1/\delta)$ was not known and resolves a question of Grossman-Moshkovitz (FOCS 2015).

Chat is not available.