NeurIPS Poster Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Poster

Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Weixin Chen · Baoyuan Wu · Haoqian Wang

Keywords: [ trustworthy AI ] [ AI security ] [ Backdoor Learning ] [ Backdoor Defense ]

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract:

Poisoning-based backdoor attacks are serious threat for training deep models on data from untrustworthy sources. Given a backdoored model, we observe that the feature representations of poisoned samples with trigger are more sensitive to transformations than those of clean samples. It inspires us to design a simple sensitivity metric, called feature consistency towards transformations (FCT), to distinguish poisoned samples from clean samples in the untrustworthy training set. Moreover, we propose two effective backdoor defense methods. Built upon a sample-distinguishment module utilizing the FCT metric, the first method trains a secure model from scratch using a two-stage secure training module. And the second method removes backdoor from a backdoored model with a backdoor removal module which alternatively unlearns the distinguished poisoned samples and relearns the distinguished clean samples. Extensive results on three benchmark datasets demonstrate the superior defense performance against eight types of backdoor attacks, to state-of-the-art backdoor defenses. Codes are available at: https://github.com/SCLBD/Effectivebackdoordefense.

Chat is not available.