"It Doesn’t Know Anything About my Work": Participatory Benchmarking and AI Evaluation in Applied Settings
Elizabeth Watkins · Emanuel Moss · Ramesh Manuvinakurike · Christopher Persaud · Giuseppe Raffa · Lama Nachman
Abstract
This empirical paper investigates the benefits of socially embedded approaches to model evaluation. We present findings from a participatory benchmarking evaluation of an AI assistant deployed in a manufacturing setting, demonstrating how evaluation practices that incorporate end-users’ situated expertise enable more nuanced assessments of model performance. By foregrounding context-specific knowledge, these practices more accurately capture real-world functionality and inform iterative system improvement. We conclude by outlining implications for the design of context-aware AI evaluation frameworks.
Chat is not available.
Successful Page Load