NeurIPS Poster Alternation makes the adversary weaker in two-player games

Spotlight Poster

Alternation makes the adversary weaker in two-player games

Volkan Cevher · Ashok Cutkosky · Ali Kavis · Georgios Piliouras · Stratis Skoulakis · Luca Viano

Great Hall & Hall B1+B2 (level 1) #919

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: Motivated by alternating game-play in two-player games, we study an altenating variant of the \textit{Online Linear Optimization} (OLO). In alternating OLO, a \textit{learner} at each round

t \in [n]

$t \in [n]$ selects a vector

x^{t}

$x^t$ and then an \textit{adversary} selects a cost-vector

c^{t} \in [- 1, 1]^{n}

$c^t \in [-1,1]^n$ . The learner then experiences cost

(c^{t} + c^{t - 1})^{⊤} x^{t}

$(c^t + c^{t-1})^\top x^t$ instead of

(c^{t})^{⊤} x^{t}

$(c^t)^\top x^t$ as in standard OLO. We establish that under this small twist, the

Ω (\sqrt{T})

$\Omega(\sqrt{T})$ lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit

O ((\log n)^{4 / 3} T^{1 / 3})

$\mathcal{O}((\log n)^{4/3} T^{1/3})$ regret for the

n

$n$ -dimensional simplex and

O (ρ \log T)

$\mathcal{O}(\rho \log T)$ regret for the ball of radius

ρ > 0

$\rho>0$ . Our results imply that in alternating game-play, an agent can always guarantee

\tilde{O} ((\log n)^{4 / 3} T^{1 / 3})

$\mathcal{\tilde{O}}((\log n)^{4/3} T^{1/3})$ regardless the strategies of the other agent while the regret bound improves to

O (\log T)

$\mathcal{O}(\log T)$ in case the agent admits only two actions.

Chat is not available.