Timezone: »

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
Siddartha Y. Ramamohan · Arun Rajkumar · Shivani Agarwal · Shivani Agarwal

Mon Dec 05 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #97 #None
Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show $O(\log T)$ anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.

Author Information

Siddartha Y. Ramamohan (Indian Institute of Science)
Arun Rajkumar (Indian Institute of Technology Madras)
Shivani Agarwal (Radcliffe Institute)
Shivani Agarwal (University of Pennsylvania)

More from the Same Authors