Keywords: [ human-AI-collaboration ] [ AI evaluation ]
Improving the performance of human-AI (artificial intelligence) collaborations tends to be narrowly scoped, with better prediction performance often considered the only metric of improvement. As a result, work on improving the collaboration usually focuses on improving the AI's accuracy. Here, we argue that such a focus is myopic, and instead, practitioners should take a more holistic view of measuring the performance of AI models, and human-AI collaboration more specifically. In particular, we argue that although some use cases merit optimizing for classification accuracy, for others, accuracy is less important and improvement on human-centered metrics should be valued instead.