Poster
Improved Analysis for Bandit Learning in Matching Markets
Fang Kong · Zilong Wang · Shuai Li
West Ballroom A-D #5702
Abstract:
A rich line of works study the bandit learning problem in two-sided matching markets, where one side of market participants (players) are uncertain about their preferences and hope to find a stable matching during iterative matchings with the other side (arms). The state-of-the-art analysis shows that the player-optimal stable regret is of order where is the number of arms, is the horizon and is the players' minimum preference gap. However, this result may be far from the lower bound since the number of arms (workers, publisher slots) may be much larger than that of players (employers in labor markets, advertisers in online advertising, respectively). In this paper, we propose a new algorithm and show that the regret can be upper bounded by . This result removes the dependence on in the main order term and improves the state-of-the-art guarantee in common cases where is much smaller than . Such an advantage is also verified in experiments. In addition, we provide a refined analysis for the existing centralized UCB algorithm and show that, under -condition, it achieves an improved regret.
Chat is not available.