Statistical Inference for Model Responsiveness Audits
Abstract
Many safety failures in machine learning arise when models assign predictionsto people – e.g., in lending, hiring, or content moderation – without accountingfor how individuals can change their inputs. We introduce a formal validationprocedure for the responsiveness of predictions with respect to interventions ontheir features. Our procedure frames responsiveness as a type of sensitivity analysisin which practitioners control a set of changes by specifying constraints overinterventions and distributions over downstream effects. We describe how toestimate responsiveness for the predictions of any model and any dataset using onlyblack-box access, and design algorithms that use these estimates to support taskssuch as falsification and failure probability analysis. The resulting audits uncoverthe problem at hand and enable community or regulatory oversight: when lack (orexcess) of responsiveness is negligible, off-the-shelf models suffice; when material,findings motivate redesign (e.g., strategic classification) or policy changes. Wedemonstrate these safety benefits and illustrate how collective stakeholders canhelp steer AI systems.