"It Doesn’t Know Anything About my Work": Participatory Benchmarking and AI Evaluation in Applied Settings
Elizabeth Watkins ⋅ Emanuel Moss ⋅ Ramesh Manuvinakurike ⋅ Christopher Persaud ⋅ Giuseppe Raffa ⋅ Lama Nachman
Abstract
This empirical paper investigates the benefits of socially embedded approaches to model evaluation. We present findings from a participatory benchmarking evaluation of an AI assistant deployed in a manufacturing setting, demonstrating how evaluation practices that incorporate end-users’ situated expertise enable more nuanced assessments of model performance. By foregrounding context-specific knowledge, these practices more accurately capture real-world functionality and inform iterative system improvement. We conclude by outlining implications for the design of context-aware AI evaluation frameworks.
Chat is not available.
Successful Page Load