Skip to yearly menu bar Skip to main content


Poster

LT-Defense: Searching-free Backdoor Defense via Exploiting Long-Tailed Effect

Yixiao Xu · Binxing Fang · Mohan Li · Keke Tang · Zhihong Tian

[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Ensuring the security of language models against backdoor attacks is critical. Existing solutions attempt to identify backdoor triggers for each class, which can be time-consuming with a large number of targets. We observe that poisoned data will create a long-tailed effect in the victim model, causing the decision boundary to shift towards the attack targets. This effect can be detected using clean examples. Inspired by this observation, we introduce LT-Defense, the first searching-free backdoor defense via exploiting long-tailed effect. Specifically, LT-Defense employs a small set of clean examples and two metrics to distinguish backdoor-related features in the target model. Upon detecting a backdoor model, LT-Defense additionally provides test-time backdoor freezing and attack target prediction. Extensive experiments demonstrate the effectiveness of LT-Defense in both detection accuracy and efficiency, e.g., in task-agnostic scenarios, LT-Defense achieves 98% accuracy across 1440 models with less than 1% of the time cost of state-of-the-art solutions. Codes will be made public upon paper acceptance.

Live content is unavailable. Log in and register to view live content