Efficient Top-$k$ Recovery of General Combinatorial Bandits with a Reduction to Relative Feedback
Aniket Wagde · Aadirupa Saha
Abstract
In this paper, we study the problem of combinatorial bandits with stochastic full-bandit feedback, where the feedback is aggregated using an unknown operator. Unlike traditional combinatorial bandits, where the feedback is often assumed to be linear, we consider a setting with non-linear reward functions determined by any monotonic operator, which generalize various aggregation methods, including maximum, minimum, and k-order statistics. The challenge arises from the need to identify the top-$k$ arms efficiently without explicit knowledge of the underlying aggregation operator. We propose novel algorithms that leverage sub-Gaussian noise assumptions and gap-based analysis to provide strong theoretical guarantees on the sample complexity for identifying the top-k arms with high confidence. We apply a modified version of an algorithm designed for pairwise preferences. Our results extend to various scenarios, including any monotonic function, assuming that the noise can be assumed to be sub-Gaussian, showing that our methods achieve strong performance in terms of both time and sample efficiency.
Chat is not available.
Successful Page Load