Timezone: »
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.
Author Information
Julius Adebayo (MIT)
Justin Gilmer (Google Brain)
Michael Muelly (Google)
Ian Goodfellow (Google)
Moritz Hardt (Google Brain)
Been Kim (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Spotlight: Sanity Checks for Saliency Maps »
Wed Dec 5th 03:30 -- 03:35 PM Room Room 220 E
More from the Same Authors
-
2020 Poster: Debugging Tests for Model Explanations »
Julius Adebayo · Michael Muelly · Ilaria Liccardi · Been Kim -
2020 Poster: On Completeness-aware Concept-Based Explanations in Deep Neural Networks »
Chih-Kuan Yeh · Been Kim · Sercan Arik · Chun-Liang Li · Tomas Pfister · Pradeep Ravikumar -
2019 Poster: Towards Automatic Concept-based Explanations »
Amirata Ghorbani · James Wexler · James Zou · Been Kim -
2019 Poster: A Fourier Perspective on Model Robustness in Computer Vision »
Dong Yin · Raphael Gontijo Lopes · Jon Shlens · Ekin Dogus Cubuk · Justin Gilmer -
2019 Poster: Visualizing and Measuring the Geometry of BERT »
Emily Reif · Ann Yuan · Martin Wattenberg · Fernanda Viegas · Andy Coenen · Adam Pearce · Been Kim -
2019 Poster: A Benchmark for Interpretability Methods in Deep Neural Networks »
Sara Hooker · Dumitru Erhan · Pieter-Jan Kindermans · Been Kim -
2018 Poster: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2018 Poster: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Spotlight: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2018 Spotlight: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Poster: To Trust Or Not To Trust A Classifier »
Heinrich Jiang · Been Kim · Melody Guan · Maya Gupta -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein -
2017 Workshop: Machine Deception »
Ian Goodfellow · Tim Hwang · Bryce Goodman · Mikel Rodriguez -
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha Sohl-Dickstein -
2016 Poster: Unsupervised Learning for Physical Interaction through Video Prediction »
Chelsea Finn · Ian Goodfellow · Sergey Levine -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2009 Poster: Measuring Invariances in Deep Networks »
Ian Goodfellow · Quoc V. Le · Andrew M Saxe · Andrew Y Ng