Poster
in
Workshop: Regulatable ML: Towards Bridging the Gaps between Machine Learning Research and Regulations

PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks

Ziquan Liu · zhuo zhi · Ilija Bogunovic · Carsten Gerner-Beuerle · Miguel Rodrigues

Project Page [ OpenReview]

Abstract

It is widely known that state-of-the-art machine learning models — including vision and language models — can be seriously compromised by adversarial perturbations, so it is also increasingly relevant to develop capability to certify their performance in the presence of the most effective adversarial attacks. Our paper offers a new approach to certify the performance of machine learning models in the presence of adversarial attacks, with population level risk guarantees. In particular, given a specific attack, we introduce the notion of a $(\alpha,\zeta)$ machine learning model safety guarantee: this guarantee, which is supported by a testing procedure based on the availability of a calibration set, entails one will only declare that a machine learning model adversarial (population) risk is less than $\alpha$ (i.e. the model is safe) given that the model adversarial (population) risk is higher than $\alpha$ (i.e. the model is in fact unsafe), with probability less than $\zeta$. We also propose Bayesian optimization algorithms to determine very efficiently whether or not a machine learning model is $(\alpha,\zeta)$-safe in the presence of an adversarial attack, along with their statistical guarantees. We apply our framework to a range of machine learning models — including various sizes of vision Transformer (ViT) and ResNet models — impaired by a variety of adversarial attacks such as AutoAttack, SquareAttack and natural evolution strategy attack, in order to illustrate the merit of our approach. Of particular relevance, we show that ViT's are generally more robust to adversarial attacks than ResNets and ViT-large is more robust than smaller models. Overall, our approach goes beyond existing empirical adversarial risk based certification guarantees, paving the way to more effective AI regulation based on rigorous (and provable) performance guarantees.

Video

Chat is not available.