Structured Prediction with Logged Bandit Feedback
in
Workshop: Constructive Machine Learning
Abstract
Conventional supervised learning algorithms require training data that includes 'optimal' labels. Unfortunately, such optimal labels may be difficult to annotate or even define for many constructive ML tasks. For example, what is the optimal layout of a personalized newspaper for a particular user on a given day? While the optimal layout may be unattainable as training data, it may be easy to infer the quality of a particular layout that was presented to the user (e.g., from behavioral signals). This means that we may easily get bandit feedback for learning, but not full-information feedback. In fact, such bandit-style log data is one of the most ubiquitous forms of data available, as it can be recorded from a variety of systems (e.g., search engines, recommender systems, ad placement) at little cost.