NeurIPS Detecting Backdoors with Meta-Models

Poster
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly

Detecting Backdoors with Meta-Models

Lauro Langosco · Neel Alex · William Baker · David Quarel · Herbie Bradley · David Krueger

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract: It is widely known that it is possible to implant backdoors into neural networks,by which an attacker can choose an input to produce a particular undesirable output (e.g.\ misclassify an image).We propose to use \emph{meta-models}, neural networks that take another network's parameters as input, to detect backdoors directly from model weights.To this end we present a meta-model architecture and train it on a dataset of approx.\ 4000 clean and backdoored CNNs trained on CIFAR-10.Our approach is simple and scalable, and is able to detect the presence of a backdoor with $>99\%$ accuracy when the test trigger pattern is i.i.d., with some success even on out-of-distribution backdoors.

Chat is not available.

Poster in Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly

Detecting Backdoors with Meta-Models

Lauro Langosco · Neel Alex · William Baker · David Quarel · Herbie Bradley · David Krueger

Poster
in
Workshop: Backdoors in Deep Learning: The Good, the Bad, and the Ugly