Timezone: »
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
Author Information
Benjamin Recht (UC Berkeley)
Becca Roelofs (Google Research)
Ludwig Schmidt (University of Washington)
Vaishaal Shankar (UC Berkeley)
More from the Same Authors
-
2021 : Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning »
Thomas Liao · Rohan Taori · Deborah Raji · Ludwig Schmidt -
2021 : Evaluating Machine Accuracy on ImageNet »
Vaishaal Shankar · Becca Roelofs · Horia Mania · Benjamin Recht · Ludwig Schmidt -
2021 : Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2021 : Robust fine-tuning of zero-shot models »
Mitchell Wortsman · Gabriel Ilharco · Jong Wook Kim · Mike Li · Hanna Hajishirzi · Ali Farhadi · Hongseok Namkoong · Ludwig Schmidt -
2021 Oral: Retiring Adult: New Datasets for Fair Machine Learning »
Frances Ding · Moritz Hardt · John Miller · Ludwig Schmidt -
2021 Poster: Retiring Adult: New Datasets for Fair Machine Learning »
Frances Ding · Moritz Hardt · John Miller · Ludwig Schmidt -
2021 Poster: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning »
Timo Milbich · Karsten Roth · Samarth Sinha · Ludwig Schmidt · Marzyeh Ghassemi · Bjorn Ommer -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2020 : Contributed Talk 6: Do Offline Metrics Predict Online Performance in Recommender Systems? »
Karl Krauth · Sarah Dean · Wenshuo Guo · Benjamin Recht · Michael Jordan -
2020 Oral: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent »
Benjamin Recht · Christopher Ré · Stephen Wright · Feng Niu -
2020 Poster: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2020 Spotlight: Measuring Robustness to Natural Distribution Shifts in Image Classification »
Rohan Taori · Achal Dave · Vaishaal Shankar · Nicholas Carlini · Benjamin Recht · Ludwig Schmidt -
2019 Poster: Model Similarity Mitigates Test Set Overuse »
Horia Mania · John Miller · Ludwig Schmidt · Moritz Hardt · Benjamin Recht -
2019 Poster: Unlabeled Data Improves Adversarial Robustness »
Yair Carmon · Aditi Raghunathan · Ludwig Schmidt · John Duchi · Percy Liang -
2019 Poster: A Meta-Analysis of Overfitting in Machine Learning »
Becca Roelofs · Vaishaal Shankar · Benjamin Recht · Sara Fridovich-Keil · Moritz Hardt · John Miller · Ludwig Schmidt -
2019 Poster: Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator »
Karl Krauth · Stephen Tu · Benjamin Recht -
2019 Poster: Certainty Equivalence is Efficient for Linear Quadratic Control »
Horia Mania · Stephen Tu · Benjamin Recht -
2018 Poster: Simple random search of static linear policies is competitive for reinforcement learning »
Horia Mania · Aurelia Guy · Benjamin Recht -
2018 Poster: Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator »
Sarah Dean · Horia Mania · Nikolai Matni · Benjamin Recht · Stephen Tu -
2017 Workshop: OPT 2017: Optimization for Machine Learning »
Suvrit Sra · Sashank J. Reddi · Alekh Agarwal · Benjamin Recht -
2017 Poster: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Oral: The Marginal Value of Adaptive Gradient Methods in Machine Learning »
Ashia C Wilson · Becca Roelofs · Mitchell Stern · Nati Srebro · Benjamin Recht -
2017 Oral: Test of Time Award »
ali rahimi · Benjamin Recht -
2016 : Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction. »
Vaishaal Shankar -
2016 Poster: The Power of Adaptivity in Identifying Statistical Alternatives »
Kevin Jamieson · Daniel Haas · Benjamin Recht -
2016 Poster: Cyclades: Conflict-free Asynchronous Machine Learning »
Xinghao Pan · Maximilian Lam · Stephen Tu · Dimitris Papailiopoulos · Ce Zhang · Michael Jordan · Kannan Ramchandran · Christopher Ré · Benjamin Recht -
2015 Poster: Parallel Correlation Clustering on Big Graphs »
Xinghao Pan · Dimitris Papailiopoulos · Samet Oymak · Benjamin Recht · Kannan Ramchandran · Michael Jordan