Skip to yearly menu bar Skip to main content


Poster

Improved Analysis for Bandit Learning in Matching Markets

Fang Kong · Zilong Wang · Shuai Li

West Ballroom A-D #5702
[ ]
[ Paper [ Poster [ OpenReview
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: A rich line of works study the bandit learning problem in two-sided matching markets, where one side of market participants (players) are uncertain about their preferences and hope to find a stable matching during iterative matchings with the other side (arms). The state-of-the-art analysis shows that the player-optimal stable regret is of order O(KlogT/Δ2) where K is the number of arms, T is the horizon and Δ is the players' minimum preference gap. However, this result may be far from the lower bound Ω(max{NlogT/Δ2,KlogT/Δ}) since the number K of arms (workers, publisher slots) may be much larger than that N of players (employers in labor markets, advertisers in online advertising, respectively). In this paper, we propose a new algorithm and show that the regret can be upper bounded by O(N2logT/Δ2+KlogT/Δ). This result removes the dependence on K in the main order term and improves the state-of-the-art guarantee in common cases where N is much smaller than K. Such an advantage is also verified in experiments. In addition, we provide a refined analysis for the existing centralized UCB algorithm and show that, under α-condition, it achieves an improved O(NlogT/Δ2+KlogT/Δ) regret.

Chat is not available.