Timezone: »
Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotical optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.
Author Information
Junpei Komiyama (The University of Tokyo)
Junya Honda (The University of Tokyo)
Hiroshi Nakagawa (The University of Tokyo)
More from the Same Authors
-
2019 Poster: On the Calibration of Multiclass Classification with Rejection »
Chenri Ni · Nontawat Charoenphakdee · Junya Honda · Masashi Sugiyama -
2017 Poster: Position-based Multiple-play Bandit Problem with Unknown Position Bias »
Junpei Komiyama · Junya Honda · Akiko Takeda -
2016 Poster: Differential Privacy without Sensitivity »
Kentaro Minami · Hiromi Arai · Issei Sato · Hiroshi Nakagawa