Skip to yearly menu bar Skip to main content

Workshop: Trustworthy and Socially Responsible Machine Learning

A Closer Look at the Intervention Procedure of Concept Bottleneck Models

Sungbin Shin · Yohan Jo · Sungsoo Ahn · Namhoon Lee


Concept bottleneck models (CBMs) are a class of interpretable neural network models that predict the target label of a given input based on its high-level concepts. Unlike other end-to-end deep learning models, CBMs enable domain experts to intervene on the predicted concepts at test time so that more accurate and reliable target predictions can be made. While the intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain underexplored. In this work, we inspect the current intervention practice for its efficiency and reliability. Specifically, we first present an array of new intervention methods to significantly improve the target prediction accuracy for a given budget of intervention expense. We also bring attention to non-trivial yet unknown issues related to reliability and fairness of the intervention and discuss how we can fix these problems in practice.

Chat is not available.