Timezone: »

Learning Global Transparent Models consistent with Local Contrastive Explanations
Tejaswini Pedapati · Avinash Balakrishnan · Karthikeyan Shanmugam · Amit Dhurandhar

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #1072

There is a rich and growing literature on producing local contrastive/counterfactual explanations for black-box models (e.g. neural networks). In these methods, for an input, an explanation is in the form of a contrast point differing in very few features from the original input and lying in a different class. Other works try to build globally interpretable models like decision trees and rule lists based on the data using actual labels or based on the black-box models predictions. Although these interpretable global models can be useful, they may not be consistent with local explanations from a specific black-box of choice. In this work, we explore the question: Can we produce a transparent global model that is simultaneously accurate and consistent with the local (contrastive) explanations of the black-box model? We introduce a local consistency metric that quantifies if the local explanations for the black-box model are also applicable to the proxy/surrogate globally transparent model. Based on a key insight we propose a novel method where we create custom boolean features from local contrastive explanations of the black-box model and then train a globally transparent model that has higher local consistency compared with other known strategies in addition to being accurate.

Author Information

Tejaswini Pedapati (IBM Research)
Avinash Balakrishnan (IBM)
Karthikeyan Shanmugam (IBM Research, NY)
Amit Dhurandhar (IBM Research)

More from the Same Authors