Statistical Inference for Responsiveness Verification
Abstract
Many safety failures in machine learning arise when models assign individual predictions while ignoring how model inputs can change. In this work, we introduce a formal validation procedure for the responsiveness of predictions with respect to interventions on their features. Our machinery uses mixed-integer programming to produce reachable sets over both discrete and continuous features, enabling black-box estimation with exact statistical guarantees. By optimizing to certify, we support falsification and failure-probability estimation at scale. This approach runs many small mixed-integer feasibility checks, reuses sampled reachable sets across models, and surfaces concrete counterexamples. We further demonstrate safety benefits in recidivism, transplant prioritization, and content moderation.