Poster
in
Workshop: Privacy in Machine Learning (PriML) 2021
Reconstructing Test Labels from Noisy Loss Scores (Extended Abstract)
Abhinav Aggarwal · Shiva Kasiviswanathan · Zekun Xu · Oluwaseyi Feyisetan · Nathanael Teissier
Label inference was recently introduced as the problem of reconstructing the ground truth labels of a private dataset from just the (possibly perturbed) cross-entropy loss scores evaluated at carefully crafted prediction vectors. In this paper, we generalize this result to provide necessary and sufficient conditions under which label inference is possible from a broad class of loss functions. We show that for many commonly used loss functions, including linearly decomposable losses, some Bregman divergence-based losses and when common activation functions are used, it is possible to design such attacks for arbitrary noise levels. We demonstrate that these attacks can also be carried out through a lightweight augmentation to any neural network model, enabling the adversary to make these attacks look benign. Our results call to attention these vulnerabilities which might be currently under silent exploitation. Armed with this information, individuals and organizations, which vend these seemingly innocuous aggregate metrics from their classification models, can grasp the potential scope of the resulting information leakage.