True Impact of Cascade Length in Contextual Cascading Bandits
Hyun-jun Choi · Joongkyu Lee · Min-hwan Oh
Abstract
We revisit the contextual cascading bandit, where a learning agent recommends an ordered list (\emph{cascade}) of items, and a user scans the list sequentially, stopping at the first attractive item. Although cascading bandits underpin various applications including recommender systems and search engines, the role of the cascade length $K$ in shaping regret has remained unclear. Contrary to prior results that regret grows with $K$, we prove that regret actually \emph{decreases} once $K$ is large enough. Leveraging this insight, we design a new upper-confidence-bound algorithm built on online mirror descent that attains the sharpest known regret upper bound, $\tilde{\mathcal{O}}\bigl(\min \lbrace K\bar{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$ for contextual cascading bandits. To complement this new regret upper bound, we provide a nearly matching lower bound of $\Omega \bigl(\min \lbrace K\underline{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$, where $0 \leq \underline{p} \leq \bar{p} < 1$. Together, these results fully characterize how regret truly scales with $K$, thereby closing the theoretical gap for contextual cascading bandits. Finally, comprehensive experiments validate our theoretical results and show the effectiveness of our proposed method.
Video
Chat is not available.
Successful Page Load