Timezone: »
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.
Author Information
Julius Adebayo (MIT)
Justin Gilmer (Google Brain)
Michael Muelly (Google)
Ian Goodfellow (Google)
Moritz Hardt (Google Brain)
Been Kim (Google)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Poster: Sanity Checks for Saliency Maps »
Wed. Dec 5th 03:45 -- 05:45 PM Room Room 210 #30
More from the Same Authors
-
2022 : Do Current Multi-Task Optimization Methods in Deep Learning Even Help? »
Derrick Xin · Behrooz Ghorbani · Dami Choi · Ankush Garg · Orhan Firat · Justin Gilmer -
2022 Poster: Do Current Multi-Task Optimization Methods in Deep Learning Even Help? »
Derrick Xin · Behrooz Ghorbani · Justin Gilmer · Ankush Garg · Orhan Firat -
2021 : Interpretability of Machine Learning in Computer Systems: Analyzing a Caching Model »
Leon Sixt · Evan Liu · Marie Pellat · James Wexler · Milad Hashemi · Been Kim · Martin Maas -
2020 Poster: Debugging Tests for Model Explanations »
Julius Adebayo · Michael Muelly · Ilaria Liccardi · Been Kim -
2020 Poster: On Completeness-aware Concept-Based Explanations in Deep Neural Networks »
Chih-Kuan Yeh · Been Kim · Sercan Arik · Chun-Liang Li · Tomas Pfister · Pradeep Ravikumar -
2019 : Poster Session »
Clement Canonne · Kwang-Sung Jun · Seth Neel · Di Wang · Giuseppe Vietri · Liwei Song · Jonathan Lebensold · Huanyu Zhang · Lovedeep Gondara · Ang Li · FatemehSadat Mireshghallah · Jinshuo Dong · Anand D Sarwate · Antti Koskela · Joonas Jälkö · Matt Kusner · Dingfan Chen · Mi Jung Park · Ashwin Machanavajjhala · Jayashree Kalpathy-Cramer · · Vitaly Feldman · Andrew Tomkins · Hai Phan · Hossein Esfandiari · Mimansa Jaiswal · Mrinank Sharma · Jeff Druce · Casey Meehan · Zhengli Zhao · Hsiang Hsu · Davis Railsback · Abraham Flaxman · · Julius Adebayo · Aleksandra Korolova · Jiaming Xu · Naoise Holohan · Samyadeep Basu · Matthew Joseph · My Thai · Xiaoqian Yang · Ellen Vitercik · Michael Hutchinson · Chenghong Wang · Gregory Yauney · Yuchao Tao · Chao Jin · Si Kai Lee · Audra McMillan · Rauf Izmailov · Jiayi Guo · Siddharth Swaroop · Tribhuvanesh Orekondy · Hadi Esmaeilzadeh · Kevin Procopio · Alkis Polyzotis · Jafar Mohammadi · Nitin Agrawal -
2019 Poster: Towards Automatic Concept-based Explanations »
Amirata Ghorbani · James Wexler · James Zou · Been Kim -
2019 Poster: A Fourier Perspective on Model Robustness in Computer Vision »
Dong Yin · Raphael Gontijo Lopes · Jonathon Shlens · Ekin Dogus Cubuk · Justin Gilmer -
2019 Poster: Visualizing and Measuring the Geometry of BERT »
Emily Reif · Ann Yuan · Martin Wattenberg · Fernanda Viegas · Andy Coenen · Adam Pearce · Been Kim -
2019 Poster: A Benchmark for Interpretability Methods in Deep Neural Networks »
Sara Hooker · Dumitru Erhan · Pieter-Jan Kindermans · Been Kim -
2018 : Invited talk 1: Scalable PATE and the Secret Sharer »
Ian Goodfellow -
2018 : Interpretability for when NOT to use machine learning by Been Kim »
Been Kim -
2018 Poster: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2018 Poster: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Spotlight: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms »
Avital Oliver · Augustus Odena · Colin A Raffel · Ekin Dogus Cubuk · Ian Goodfellow -
2018 Spotlight: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Poster: To Trust Or Not To Trust A Classifier »
Heinrich Jiang · Been Kim · Melody Guan · Maya Gupta -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein -
2017 : Invited Talk: Overcoming Limited Data with GANs »
Ian Goodfellow -
2017 : Adversarial Robustness for Aligned AI »
Ian Goodfellow -
2017 : Defending Against Adversarial Examples »
Ian Goodfellow -
2017 : Invited Talk »
Ian Goodfellow -
2017 : Adversarial Patch »
Justin Gilmer -
2017 : Competition I: Adversarial Attacks and Defenses »
Alexey Kurakin · Ian Goodfellow · Samy Bengio · Yao Zhao · Yinpeng Dong · Tianyu Pang · Fangzhou Liao · Cihang Xie · Adithya Ganesh · Oguz Elibol -
2017 Workshop: Machine Deception »
Ian Goodfellow · Tim Hwang · Bryce Goodman · Mikel Rodriguez -
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha Sohl-Dickstein -
2016 Poster: Unsupervised Learning for Physical Interaction through Video Prediction »
Chelsea Finn · Ian Goodfellow · Sergey Levine -
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean Pouget-Abadie · Mehdi Mirza · Bing Xu · David Warde-Farley · Sherjil Ozair · Aaron Courville · Yoshua Bengio -
2013 Poster: Multi-Prediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio -
2009 Poster: Measuring Invariances in Deep Networks »
Ian Goodfellow · Quoc V. Le · Andrew M Saxe · Andrew Y Ng