Timezone: »

Regret, stability, and fairness in matching markets with bandit learners
Sarah Cen · Devavrat Shah
Event URL: https://openreview.net/forum?id=elwTdHSMjT_ »

We consider the two-sided matching market with bandit learners. In a matching market, there are two sets of agents---users and providers---that seek to be matched one-to-one. Contrary to a core assumption in the literature, users and providers often do not know their true matching preferences a priori and must learn them. To address this assumption, recent works propose to blend the matching and multi-armed bandit problems. One recent work finds that it is possible to assign matchings that are stable (i.e., no pair of agents is incentivized to defect) at every time step while also allowing agents to learn enough so that the system converges to matchings that are stable under the agents' true preferences. However, the authors also uncover an impossibility result: under their setup, one cannot simultaneously guarantee stability and low optimal regret. In this work, we study the two-sided matching market with bandit learners under costs and transfers in order to faithfully model competition between agents. We show that, in contrast to previous work, it is possible to simultaneously guarantee four desiderata: (1) stability, (2) low optimal regret, (3) fairness in the distribution of regret, and (4) high social welfare.

Author Information

Sarah Cen (Massachusetts Institute of Technology)

PhD student at MIT EECS. Working with Prof. Devavrat Shah in Laboratory for Information and Decision Systems. Research on topics including causal inference and responsible ML. Have previously worked on self-driving cars, robotics, information networks, and multi-armed bandits.

Devavrat Shah (Massachusetts Institute of Technology)

Devavrat Shah is a professor of Electrical Engineering & Computer Science and Director of Statistics and Data Science at MIT. He received PhD in Computer Science from Stanford. He received Erlang Prize from Applied Probability Society of INFORMS in 2010 and NeuIPS best paper award in 2008.

More from the Same Authors