Timezone: »

Poster
Combinatorial Bandits with Relative Feedback

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #18
We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation problems over subsets of a finite ground set $[n]$, with subset-wise relative preference information feedback according to the Multinomial logit choice model. In the first setting, the learner can play subsets of size bounded by a maximum size and receives top-$m$ rank-ordered feedback, while in the second setting the learner can play subsets of a fixed size $k$ with a full subset ranking observed as feedback. For both settings, we devise instance-dependent and order-optimal regret algorithms with regret $O(\frac{n}{m} \ln T)$ and $O(\frac{n}{k} \ln T)$, respectively. We derive fundamental limits on the regret performance of online learning with subset-wise preferences, proving the tightness of our regret guarantees. Our results also show the value of eliciting more general top-$m$ rank-ordered feedback over single winner feedback ($m=1$). Our theoretical results are corroborated with empirical evaluations.

#### Author Information

##### Aadirupa Saha (Indian Institute of Science)

Aadirupa Saha is a PhD student at the department of Computer Science and Automation (CSA), Indian Institute of Science (IISc), Bangalore and was a research intern at Google, Mountain View, CA (June-Sept, 2019). Her research interests broadly lie in the areas of Machine Learning, Statistical Learning Theory and Optimization. Her current research specifically focuses on decision making under uncertainty from sequential data, reinforcement learning, and preference based rank aggregation problems.