Poster

Online Learning of Delayed Choices

Recep Yusuf Bekci

[ Paper] [ OpenReview]

Abstract

Choice models are essential for understanding decision-making processes in domains like online advertising, product recommendations, and assortment optimization. The Multinomial Logit (MNL) model is particularly versatile in selecting products or advertisements for display. However, challenges arise with unknown MNL parameters and delayed feedback, requiring sellers to learn customers’ choice behavior and make dynamic decisions with biased knowledge due to delays. We address these challenges by developing an algorithm that handles delayed feedback, balancing exploration and exploitation using confidence bounds and optimism. We first consider a censored setting where a threshold for considering feedback is imposed by business requirements. Our algorithm demonstrates a $\tilde{O}(\sqrt{NT})$ regret, with a matching lower bound up to a logarithmic term. Furthermore, we extend our analysis to environments with non-thresholded delays, achieving a $\tilde{O}(\sqrt{NT})$ regret. To validate our approach, we conduct experiments that confirm the effectiveness of our algorithm.

Video

Chat is not available.