Oral Poster
Enhancing Preference-based Linear Bandits via Human Response Time
Shen Li · Yuyang Zhang · Claire Liang · Zhaolin Ren · Na Li · Julie A Shah
East Exhibit Hall A-C #4901
Wed 11 Dec 3:30 p.m. PST — 4:30 p.m. PST
Binary human choice feedback is widely used in interactive preference learning for its simplicity but offers limited insights into preference strength. To address this, we leverage human response time, which inversely correlates with preference strength, as complementary information. Our work integrates the Drift-Diffusion model, which jointly models human choices and response times, into preference-based linear bandits. We introduce a computationally efficient utility estimator that transforms the utility estimation problem using choices and response times into a linear regression problem. Theoretical and empirical comparisons with traditional choice-only estimators reveal that for queries with strong preferences ("easy" queries), choices alone provide limited information, while response times offer valuable additional insights about preference strength. Thus, incorporating response times makes easy queries more useful. We demonstrate this benefit in the fixed-budget best-arm identification problem. Simulations based on datasets of human choices and response times for snacks, fashion item clicks, and movie ratings consistently show accelerated learning when response times are incorporated.
Live content is unavailable. Log in and register to view live content