NIPS Poster Tracking the Best Expert in Non-stationary Stochastic Environments

Poster

Tracking the Best Expert in Non-stationary Stochastic Environments

Chen-Yu Wei · Yi-Te Hong · Chi-Jen Lu

Area 5+6+7+8 #141

[ Abstract ]

Abstract: We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter

\W

$\W$ , which measures the total statistical variance of the loss distributions over

T

$T$ rounds of the process, and study how this amount affects the regret. We investigate the interaction between

\W

$\W$ and

Γ

$\Gamma$ , which counts the number of times the distributions change, as well as

\W

$\W$ and

V

$V$ , which measures how far the distributions deviates over time. One striking result we find is that even when

Γ

$\Gamma$ ,

V

$V$ , and

Λ

$\Lambda$ are all restricted to constant, the regret lower bound in the bandit setting still grows with

T

$T$ . The other highlight is that in the full-information setting, a constant regret becomes achievable with constant

Γ

$\Gamma$ and

Λ

$\Lambda$ , as it can be made independent of

T

$T$ , while with constant

V

$V$ and

Λ

$\Lambda$ , the regret still has a

T^{1 / 3}

$T^{1/3}$ dependency. We not only propose algorithms with upper bound guarantee, but prove their matching lower bounds as well.

Live content is unavailable. Log in and register to view live content