Timezone: »
Many subfields of machine learning share a common stumbling block: evaluation. Advances in machine learning often evaporate under closer scrutiny or turn out to be less widely applicable than originally hoped. We conduct a meta-review of 107 survey papers from natural language processing, recommender systems, computer vision, reinforcement learning, computational biology, graph learning, and more, organizing the wide range of surprisingly consistent critique into a concrete taxonomy of observed failure modes. Inspired by measurement and evaluation theory, we divide failure modes into two categories: internal and external validity. Internal validity issues pertain to evaluation on a learning problem in isolation, such as improper comparisons to baselines or overfitting from test set re-use. External validity relies on relationships between different learning problems, for instance, whether progress on a learning problem translates to progress on seemingly related tasks.
Author Information
Thomas Liao (Scale AI)
Rohan Taori (Stanford University)
Deborah Raji (UC Berkeley)
Ludwig Schmidt (University of Washington)
More from the Same Authors
-
2021 : AI and the Everything in the Whole Wide World Benchmark »
Deborah Raji · Emily Denton · Emily M. Bender · Alex Hanna · Amandalynne Paullada -
2021 : Do ImageNet Classifiers Generalize to ImageNet? »
Benjamin Recht · Becca Roelofs · Ludwig Schmidt · Vaishaal Shankar -
2021 : Evaluating Machine Accuracy on ImageNet »
Vaishaal Shankar · Becca Roelofs · Horia Mania · Benjamin Recht · Ludwig Schmidt -
2021 : Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2021 : Robust fine-tuning of zero-shot models »
Mitchell Wortsman · Gabriel Ilharco · Jong Wook Kim · Mike Li · Hanna Hajishirzi · Ali Farhadi · Hongseok Namkoong · Ludwig Schmidt -
2022 Social: Ethics Review - Open Discussion »
Deborah Raji · William Isaac · Cherie Poland · Alexandra Luccioni -
2021 : Evaluation as a Process for Engineering Responsibility in AI »
Deborah Raji -
2021 Oral: Retiring Adult: New Datasets for Fair Machine Learning »
Frances Ding · Moritz Hardt · John Miller · Ludwig Schmidt -
2021 Poster: Retiring Adult: New Datasets for Fair Machine Learning »
Frances Ding · Moritz Hardt · John Miller · Ludwig Schmidt -
2021 Poster: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning »
Timo Milbich · Karsten Roth · Samarth Sinha · Ludwig Schmidt · Marzyeh Ghassemi · Bjorn Ommer -
2020 : How should researchers engage with controversial applications of AI? »
Logan Koepke · CATHERINE ONEIL · Tawana Petty · Cynthia Rudin · Deborah Raji · Shawn Bushway -
2020 : Harms from AI research »
Anna Lauren Hoffmann · Nyalleng Moorosi · Vinay Prabhu · Deborah Raji · Jacob Metcalf · Sherry Stanley -
2020 Workshop: Navigating the Broader Impacts of AI Research »
Carolyn Ashurst · Rosie Campbell · Deborah Raji · Solon Barocas · Stuart Russell -
2020 : AI and the Everything in the Whole Wide World Benchmark »
Deborah Raji -
2020 : Invited Talk 3: Inioluwa Deborah Raji »
Deborah Raji -
2020 : Panel »
Kilian Weinberger · Maria De-Arteaga · Shibani Santurkar · Jonathan Frankle · Deborah Raji -
2020 Poster: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2020 Spotlight: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2019 : AI's Blindspots and Where to Find Them »
Deborah Raji -
2019 Poster: Model Similarity Mitigates Test Set Overuse »
Horia Mania · John Miller · Ludwig Schmidt · Moritz Hardt · Benjamin Recht -
2019 Poster: Unlabeled Data Improves Adversarial Robustness »
Yair Carmon · Aditi Raghunathan · Ludwig Schmidt · John Duchi · Percy Liang -
2019 Poster: A Meta-Analysis of Overfitting in Machine Learning »
Becca Roelofs · Vaishaal Shankar · Benjamin Recht · Sara Fridovich-Keil · Moritz Hardt · John Miller · Ludwig Schmidt