Timezone: »
Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays \left{ d{t}\right} that are unknown to the player. After picking arm a{t} at round t, the player receives the cost of playing this arm d{t} rounds later. In cases where t+d{t}>T, this feedback is simply missing. We prove that the EXP3 algorithm (that uses the delayed feedback upon its arrival) achieves a regret of O\left(\sqrt{\ln K\left(KT+\sum{t=1}^{T}d{t}\right)}\right). For the case where \sum{t=1}^{T}d{t} and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of O\left(\sqrt{\ln K\left(K^{2}T+\sum{t=1}^{T}d{t}\right)}\right). We then consider a two player zerosum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the “noregret property”, (e.g., where d{t}=O\left(t\log t\right)) the ergodic average of the strategy profile still converges to the set of Nash equilibria of the game. The result is made possible by choosing an adaptive step size \eta{t} that is not summable but is square summable, and proving a “weighted regret bound” for this general case.
Author Information
Ilai Bistritz (Stanford)
Zhengyuan Zhou (Stanford University)
Xi Chen (New York University)
Nicholas Bambos
Jose Blanchet (Stanford University)
More from the Same Authors

2019 Poster: Learning in Generalized Linear Contextual Bandits with Stochastic Delays »
Zhengyuan Zhou · Renyuan Xu · Jose Blanchet 
2019 Spotlight: Learning in Generalized Linear Contextual Bandits with Stochastic Delays »
Zhengyuan Zhou · Renyuan Xu · Jose Blanchet 
2019 Poster: Multivariate Distributionally Robust Convex Regression under Absolute Error Loss »
Jose Blanchet · Peter W Glynn · Jun Yan · Zhengqing Zhou 
2019 Poster: SemiParametric Dynamic Contextual Pricing »
Virag Shah · Ramesh Johari · Jose Blanchet 
2018 Poster: Distributed MultiPlayer Bandits  a Game of Thrones Approach »
Ilai Bistritz · Amir Leshem 
2018 Poster: Learning in Games with Lossy Feedback »
Zhengyuan Zhou · Panayotis Mertikopoulos · Susan Athey · Nicholas Bambos · Peter W Glynn · Yinyu Ye 
2017 Poster: Countering Feedback Delays in MultiAgent Learning »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Peter W Glynn · Claire Tomlin 
2017 Poster: Stochastic Mirror Descent in Variationally Coherent Optimization Problems »
Zhengyuan Zhou · Panayotis Mertikopoulos · Nicholas Bambos · Stephen Boyd · Peter W Glynn