NeurIPS Poster Logarithmic Regret from Sublinear Hints

Poster

Logarithmic Regret from Sublinear Hints

Aditya Bhaskara · Ashok Cutkosky · Ravi Kumar · Manish Purohit

Keywords: [ Online Learning ] [ Optimization ]

[ Abstract ]

[ OpenReview]

Abstract: We consider the online linear optimization problem, where at every step the algorithm plays a point

x_{t}

$x_t$ in the unit ball, and suffers loss

⟨ c_{t}, x_{t} ⟩

$\langle c_t, x_t \rangle$ for some cost vector

c_{t}

$c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a _hint_

h_{t}

$h_t$ that has non-trivial correlation with

c_{t}

$c_t$ before it plays

x_{t}

$x_t$ , then it can achieve a regret guarantee of

O (\log T)

$O(\log T)$ , improving on the bound of

Θ (\sqrt{T})

$\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at _every_ time step. Somewhat surprisingly, we show that an algorithm can obtain

O (\log T)

$O(\log T)$ regret with just

O (\sqrt{T})

$O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that

o (\sqrt{T})

$o(\sqrt{T})$ hints cannot guarantee better than

Ω (\sqrt{T})

$\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of {\em optimistic} regret bounds, and to the problem of online learning with abstention.

Chat is not available.