Timezone: »

Stochastic convex optimization with bandit feedback
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin

Wed Dec 14 08:45 AM -- 02:59 PM (PST) @ None #None
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in X$. We demonstrate a generalization of the ellipsoid algorithm that incurs $O(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.

Author Information

Alekh Agarwal (Microsoft Research)
Dean P Foster (University of Pennsylvania)
Daniel Hsu (Columbia University)
Sham M Kakade (Microsoft Research)
Sasha Rakhlin (University of Pennsylvania)

More from the Same Authors